DEERNIS 


and 
Algorithms 


Concepts, Techniques and Applications 








The McGraw-Hill Companies 





The Author 


G A Vijayalakshmi Pai, Ph D, is Assistant Professor of Computer Applications at PSG College of 
Technology, Coimbatore, India. She has over 20 years of experience in teaching graduate students, 
besides research. Her research interests span Computational Intelligence, Computational Finance 
and Pattern Recognition. Recipient of the AICTE Career Award for Young Teachers, 2001, 
awarded by the All India Council of Technical Education, New Delhi, she has published around 
40 papers in various international and national journals and conferences, and has also been the 
investigator for many research projects. 

She is also the adaptation author for the Schaum’s Outlines Series book on Data Structures by 
Lipschutz published by Mc-Graw Hill Education (India) Ltd., New Delhi. She can be visited at 


vijipai@vsnl.com 


DEERAS 


and 
Algorithms 


Concepts, Techniques and Applications 





G A Vijayalakshmi Pai 


Department of Computer Applications 
PSG College of Technology 
Coimbatore 


NLU, 
M 


Tata McGraw-Hill Publishing Company Limited 
NEW DELHI 
McGraw-Hill Offices 
New Delhi New York St Louis San Francisco Auckland Bogota Caracas 
Kuala Lumpur Lisbon London Madrid Mexico City Milan Montreal 
San Juan Santiago Singapore Sydney Tokyo Toronto 





The McGraw-Hill Companies 





Tata McGraw-Hill 


Published by the Tata McGraw-Hill Publishing Company Limited, 
7 West Patel Nagar, New Delhi 110 008. 


Copyright © 2008, by Tata McGraw-Hill Publishing Company Limited. 

No part of this publication may be reproduced or distributed in any form or by any means, electronic, mechanical, 
photocopying, recording, or otherwise or stored in a database or retrieval system without the prior written permission of 
the publishers. The program listings (if any) may be entered, stored and executed in a computer system, but they may not 
be reproduced for publication. 


This edition can be exported from India only by the publishers, 
Tata McGraw-Hill Publishing Company Limited 


ISBN (13): 978-0-07-066726-6 
ISBN (10): 0-07-066726-8 


Managing Director: Ajay Shukla 


General Manager: Publishing—SEM & Tech Ed: Vibha Mahajan 
Asst. Sponsoring Editor: Shalini Jha 

Editorial Executive: Nilanjan Chakravarty 

Executive—Editorial Services: Sohini Mukherjee 

Senior Proof Reader: Suneeta S Bohra 


General Manager: Marketing—Higher Education & School: Michael J Cruz 
Product Manager: SEM & Tech Ed: Biju Ganesan 


Controller—Production: Rajender P Ghansela 
Asst. General Manager—Production: B L Dogra 


Information contained in this work has been obtained by Tata McGraw-Hill, from sources believed to be reliable. 
However, neither Tata McGraw-Hill nor its authors guarantee the accuracy or completeness of any information 
published herein, and neither Tata McGraw-Hill nor its authors shall be responsible for any errors, omissions, or 


damages arising out of use of this information. This work is published with the understanding that Tata McGraw-Hill 
and its authors are supplying information but are not attempting to render engineering or other professional services. 
If such services are required, the assistance of an appropriate professional should be sought. 





Typeset at The Composers, 260, C.A. Apt., Paschim Vihar, New Delhi 110 063 and printed at 
Pashupati Printers, 429/16, Gali No. 1, Friends Colony, Industrial Area, GT Road, Shahdara, Delhi 110 095 


Cover Printer: Rashtriya Printers 


RQOLCRRLXRAQQX 





The McGraw-Hill Companies f 


The McGraw-Hill Companies 


ONG ADVANCE PRAISE 
a 


NS 





“For understanding data structure concepts, this book will be of great help to the 
students because of its simplicity and self-explanatory examples.” 

Bhupesh Deka 

Sikkim Mantpal Institute of Technology 


“Pseudocode-based algorithms are provided so that implementation can be done 
using any programming language.” 

Jibi Abraham 

M. S. Ramaiah Institute of Technology 


“The presentation and the working of the algorithms is very fresh and unique. This 
style will be appreciated both by teachers and students alike.” 

Dr. T. V. Gopal 

Anna University, Chennai 


“The comparisons between the different types of linked lists are good.” 
Dr. M. P. Sebastian 
National Institute of Technology, Calicut 


The McGraw-Hill Companies 


Gio, MORE FROM THE 
OP REVIEWERS 





“Inclusion of ADT at the end of chapters is a good feature. Also, the programming 
assignments will help the faculty in teaching the subject.” 


“The presentation of this text is very effective and better than the other popular 
books. The writing style is very precise and effective.” 


“I can assure you, I will definitely refer to this text.” 


“A wide variety of examples and exercises are given. A Level Order Traversal, for 
example, is a good exercise and generally missing in other texts.” 


“The major strengths are that its description is extremely clear and readable; its 
organization is excellent, and its exercises are motivating.” 


“I would definitely adopt this book and recommend it to my students and friends. It 
is a nice book that covers all the topics and the advanced topics as well. Any student 
who would want to master the subject of data structures must read this book.” 


The McGraw Hill Companies 





In fond memory of my father, Prof. G A Krishna Pai 


“one of the greatest lessons I have learnt in my life is to pay as much attention to 
the means of work as to its end... 

I have been always learning great lessons from that one principle and it appears to be 
that all the secret of success is there; to pay as much attention to the means as to the 
end... 

...With the means alright the end must come....” 


—Swami Vivekananda 
(Delivered at Los Angeles, California, 4 Jan, 1900) 


The McGraw-Hill Companies 


Oyj, PREFACE 


\ 





Efficient problem-solving using computers, irrespective of the discipline or application, calls for 
the design of efficient algorithms. Inclusion of appropriate data structures is of critical importance 
to the design of efficient algorithms. In other words, good algorithm design must go hand in hand 
with appropriate data structures for efficient program design to solve a problem. 


Data structures is a fundamental course in Computer Science which most undergraduate and 
graduate programmes in Computer Science, Computer Science and Engineering, and other allied 
engineering disciplines such as Computer Integrated Manufacturing, Product Design and 
Commerce and Communication Engineering , to list a few, offer during the first year or first 
semester of the programme. It is offered as a core or an elective course, enabling students to have 
the much needed foundation for efficient programming, leading to better problem-solving in their 
respective disciplines. Besides regular academic programmes, training programmes of the IT 
corporate sector and other institutes also offer a course on data structures either by way of 
certificate courses, diploma or post-diploma programmes. 


Most of the well-known textbooks/monographs on this subject have discussed the concepts in 
relation to a programming language—beginning with Pascal and spanning a spectrum of them 
such as C, C++, C#, Java, and so on—essentially calling for a fair knowledge of the language, 
before one proceeds to understand the data structure. There does remain a justification in this, 
when one argues that the implementation of data structures in a specific programming language 
needs to be demonstrated or that the algorithms pertaining to the data structure need a 
convenient medium of presentation and when this is so, why not a programming language? 


Again, while some authors have insisted on using their books for an advanced level course, 
there are some who insist on a working knowledge of the specific programming language as a 
pre-requisite to using the book. However, in the case of a core course, as it is in most academic 
programmes, it is not uncommon for a novice or a sophomore, to be bewildered by the ‘miles of 
code’ that demonstrate or explain a data structure, rendering the subject difficult. In fact, the 
effort that one needs to put in to comprehend the data structure and its applications, is distracted 
by the necessity to garner sufficient programming knowledge to follow the code. It is indeed 
ironical that while a novice is taught data structures to appreciate programming, in reality it turns 
out that one learns programming to appreciate data structures! 


In my decades-old experience of offering the course to graduate programmes which admits 
students from heterogeneous undergraduate disciplines, with little or less strong knowledge of 
programming, I had several occasions to observe this malady. 





The McGraw-Hill Companies 


x Preface 


In fact, it is not uncommon for some academic programmes, especially graduate programmes, 
which due to their shorter duration have a course in Programming and Data Structures running 
in parallel in the same semester, (much to the chagrin of the novice learner) that a novice is forced 
to learn data structures through its implementation (in a specific programming language), when 
in reality it ought to be learning augmented with implementation of the data structures, failure 
of which has been the reason behind the fallout. 


A solution to this problem would be to (i) frame the course such that the theory deals with the 
concepts, techniques and applications of data structures, not taking recourse to any specific 
programming language, but instead settling for a pseudo-language which clearly expounds the 
data structure and supplementing the course material with illustrative problems and exercises to 
reinforce the students’ grasp of the concepts, and (ii) augment the theory with laboratory sessions 
to enable the student implement the data structure in itself or as embedded in an application, in 
a language of his/her own choice or as insisted upon in the curriculum. This would enable the 
student who has acquired sufficient knowledge and insight into the data structures, to appreciate 
the beauty and the merits of employing the data structure by programming it himself or herself, 
rather than ‘look’ for the data structure in a pre-written code. 


This means that textbooks catering to the fundamental understanding of the data structure 
concepts for use as course material in the classroom are as much needed as those books which 
cater to the implementation of data structures in a programming language for use in the 
laboratory sessions. While most books in the market conform to the latter, to bring out a book for 
use as classroom course material by instructors handling a course on data structures and 
comprehensive enough for the novice students to benefit, has been the main motivation in writing 
this book. In this direction, the book details concepts, techniques and applications pertaining to 
data structures, independent of any programming language, discusses several illustrative 
problems and poses review questions to reinforce the understanding of the theory, and presents 
a suggestive list of programming assignments to aid implementation of the data structures. In 
fact, the book may be independently used as a textbook since it is self-contained or serves as a 
companion for books discussing data structures implemented in a specific programming 
language such as C, C++, Java, etc. 


The book lays an all-round emphasis on Theory, Applications, Illustrative Problems, Review 
Questions and Programming Assignments to enable the students comprehend, implement and 
appreciate data structures. The whole book is divided into five parts. 


As an introduction, the need for data structures and some basic concepts pertaining to analysis 
of algorithms which is essential to appreciate algorithms associated with data structures, have 
been presented in chapters 1-2. 


Part I details sequential linear data structures, viz., arrays, stacks, queues, priority queues and 
dequeues, and comprises chapters 3—5. Part II details linked linear data structures, viz., linked lists, 
linked stacks and linked queues, and comprises chapters 6-7. Part III elucidates the nonlinear data 
structures of trees, binary trees and graphs covering chapters 8-9. Part IV highlights the advanced 
data structures of binary search trees, AVL trees, B trees, tries, red black trees, splay trees, hash tables 
and files, which spans chapters 10-14. Part V spans chapters 15-17 and discusses searching 
algorithms of linear search, transpose sequential search, interpolation search, binary search, Fibonacci 
search, and other search techniques, and internal sorting techniques of bubble sort, insertion sort, 
selection sort, merge sort, shell sort, quick sort, heap sort and radix sort, and external sorting techniques 
of sorting with tapes, sorting with disks, polyphase merge sort and cascade merge sort. 





The McGraw-Hill Companies 


Preface xi 


The concepts and techniques behind each data structure and their applications have been 
explained. Every chapter includes a variety of Illustrative Problems pertaining to the data 
structure(s) detailed, a summary of the technical content of the chapter and a list of Review 
Questions, to reinforce the comprehension of the concepts. A set of Programming Assignments to 
be implemented in the laboratory sessions, have also been listed at the end of the appropriate 
chapters. 


The book could be used both as an introductory or an advanced-level textbook for the 
undergraduate, graduate and research programmes which offer data structures as a core or an 
elective course. While the book is primarily meant to serve as a course material for use in the 
classroom, it could be used as a companion guide during the laboratory sessions to nurture better 
understanding of the theoretical concepts. 


The book could also serve as a course material for various diploma, post-diploma programmes 
and certificate courses conducted by various IT and related institutes and corporate sectors. 


An introductory level course for a duration of one semester, targeting an undergraduate 
programme or a first-year graduate programme or a diploma programme or a certificate course, 
could include chapters 1-2, PART I, PART II, chapter 8 of PART II, chapter 13 of PART IV, 
chapter 15 (Sec. 15.1-15.2, 15.5) and chapter 16 (Sec. 16.1-16.3, 16.5, 16.7) of PART V in its 
curriculum. 


A middle-level course for a duration of one semester, targeting senior graduate-level 
programmes and research programmes such as MS/Ph D, could include chapters 1-2, PART I, 
PART II, PART III, chapters 10, 11 and 13 of PART IV, and selective sections of chapters 15-16 of 
PART V. 


An advanced-level course could include parts IV and V besides selections from the rest, based 
on the prerequisite courses satisfied. 


Chapters 8, 10, 11 (Sec. 11.10-11.3), 13, 14 and 17 could be useful for inclusion in a curriculum 
that serves as a prerequisite for a course on Database Management Systems. 


The salient features of the book are as follows: 

All-round emphasis on theory, problems, applications and programming assignments 
Simple and lucid explanation of the theory 

Inclusion of several applications to illustrate the use of data structures 

Several worked-out examples as Illustrative Problems in each chapter 

List of Programming Assignments at the end of each chapter 

Review Questions to strengthen understanding 

Self-contained text for use as a textbook for either an introductory or advanced-level course 


The book is accompanied by a web supplement that can be accessed at www.mhhe.com/pai/dsa. 
It includes the following online material: 


e Slide Presentation 
The slides illustrative of the technical content in each chapter of the book could be 
effectively used by the instructor to supplement classroom teaching. 

e Solution Manual 
Solutions to selected problems in each chapter are given here. 

e C Programs 
C implementation of algorithms, demonstrative of selective data structures, discussed in the 
book have been given here. 





The McGraw-Hill Companies 


Preface 


xii 


I express my sincere thanks to the Management and Principal, PSG College of Technology, 
Coimbatore, for the encouragement and support provided by them. I also express my 
appreciation for the editorial and production teams of McGraw-Hill Education (India) Limited, 
New Delhi, for the excellent production values. 


Thanks are also due to all the reviewers who went through the text and provided noteworthy 
suggestions and advice. Their names are listed below. 


Bhupesh Deka 
Basudev Halder 
Amitava Nag 

S.R. Biradar 

Debasis Chakroborty 
N. K. Kamila 

Sanjay Goswami 
Sanjoy Kumar Saha 
P. Sampath 

T.V. Gopal 

Suganthi Jeyaraj 

Jibi Abraham 

T. Ramesh 

Sameer Bhave 
Prashant Lakkadwala 
Manish Manoria 


Abhay Kothari 


Department of Computer Science Engineering, 
Sikkim Manipal Institute of Technology, East Sikkim 


Department of Computer Science Engineering, 
Institute of Technology & Marine Engineering College, Jingha 


Department of Computer Science & Engineering, 
Academy of Technology, Hooghly 


Department of Computer Science Engineering, 
Sikkim Manipal Institute of Technology, East Sikkim 


Department of Computer Science, 
Assansol Engineering College, Assansol 


Department of Computer Science, 
C.V Raman College of Engineering, Bhubaneswar 


Department of Computer Applications, 
Narula Institute of Technology, Kolkata 


Department of Computer Science and Engineering, 
Jadavpur University, Kolkata 


Computer Science and Engineering, 
Bannari Amman Institute of Technology, Sathyamangalam 


Department of Computer Science and Engineering, 
Anna University, Chennai 


Department of Computer Science and Engineering, 
PSG College of Technology, Coimbatore 


Department of Computer Science and Engineering, 
M.S. Ramaiah Institute of Technology, Bangalore 


Department of Computer Science, 
National Institute of Technology, Warangal 


Department of Computer Engineering 
IPS Inst of Engineering and Sciences, Indore 


Department of Computer Engineering, 
Venakateshwar Engineering College, Indore 


Department of Computer Engineering, 
Truba College of Science and Technology, Bhopal 


Sanghvi Institute of Management and Sciences, Indore 





The McGraw-Hill Companies 


Preface 


Sachin Tripathi 

R K Gupta 

H N Verma 

Dilkeshwar Pandey 
Lalitsen Sharma 

Bhavna Jain 

Akhilesh Kumar Srivastava 
Shashank Dwivedi 

Sanjay Kumar Pandey 
Rajiv Pandey 


Mayank Aggarwal 


Amit Jain 


Nilima Fulmare 


Department of Computer Science Engineering, 
Indian School of Mines, Dhanbad 


Department of Computer Engineering, 
Madhav Institute of Science and Technology, Gwalior 


Department of Computer Science, 


Sanjay Institute of Engineering and Management, Mathura 


Department of Computer Science and Engineering 
ABES Engineering College 


Department of Computer Science, 

University of Jammu, Jammu 

Department of Electronics Engineering, 

Hitkarni College of Engineering, Jabalpur 

CSE Department, Inderprastha Engineering College, 
Ghaziabad 


Department of Computer Science 
United College of Engineering and Research, Allahabad 


Department of Computer Science 
United College of Engineering and Research, Allahabad 


Department of Computer Science and Engineering 
Amity University, Lucknow 


Department of Computer Science and Engineering, 
Faculty of Engineering and Technology 
Gurukul Kangri Vishwavidyalaya, Haridwar 


Computer Science/Information Technology, 
Radha Govind Engineering College, Meerut 


Hindustan College of Science and Technology, Agra 


I would like to place on record my reverence for my mother whose blessings and prayers have 
been a source of inspiration and great strength. True to the Indian spiritual tradition, I offer my 
reverent salutations to my spiritual guru Srimat Swami Vireswaranandaji Maharaj, the tenth 
President of the Ramakrishna Math and Mission. Lastly, the infinite support, encouragement and 
help provided by my sisters Rekha and Udaya in all my endeavors, are affectionately 
remembered. 


While I hope that the book would be beneficial to novices and sophomores alike, constructive 
feedback and suggestions for improvement may kindly be mailed to vijipai@vsnl.com 


GA V Pai 


The McGraw-Hill Companies 


CONTENTS 





Advance Praise U 
More from the Reviewers vi 
Preface 1x 
1. Introduction 1 


1.1 History of Algorithms 2 

1.2 Definition, Structure and Properties of Algorithms 3 
1.3 Development of an Algorithm 4 

1.4 Data Structures and Algorithms 4 

1.5 Data Structure—Definition and Classification 5 
Summary 7 


2. Analysis of Algorithms 8 
2.1 Efficiency of Algorithms 8 
2.2  Apriori Analysis 9 
2.3 Asymptotic Notations 11 
2.4 Time Complexity of an Algorithm Using O Notation 12 
2.5 Polynomial Vs Exponential Algorithms 12 
2.6 Average, Best and Worst Case Complexities 13 
2.7 Analyzing Recursive Programs 15 
Summary 19 
Illustrative Problems 20 
Review Questions 25 


Part I 


3. Arrays 26 
3.1 Introduction 26 
3.2 Array Operations 27 
3.3 Number of Elements in an Array 27 
3.4 Representation of Arrays in Memory 28 
3.5 Applications 32 
Summary 34 
Illustrative Problems 35 
Review Questions 37 
Programming Assignments 37 





The McGraw-Hill Companies 


Contents 


Stacks 

4.1 Introduction 39 

4.2 Stack Operations 40 
4.3 Applications 43 
Summary 48 

Illustrative Problems 49 
Review Questions 54 
Programming Assignments 55 


Queues 

5.1 Introduction 56 

5.2 Operations on Queues 57 
5.3 Circular Queues 62 

5.4 Other Types of Queues 66 
5.5 Applications 71 
Summary 75 

Illustrative Problems 76 

Review Questions 81 
Programming Assignments 682 


Linked Lists 

6.1 Introduction 84 

6.2 Singly Linked Lists 87 
6.3 Circularly Linked Lists 93 
6.4 Doubly Linked Lists 98 
6.5 Multiply Linked Lists 103 
6.6 Applications 105 
Summary 112 

Illustrative Problems 113 

Review Questions 119 
Programming Assignments 121 


Linked Stacks and Linked Queues 


7.1 Introduction 123 


7.2 Operations on Linked Stacks and Linked Queues 
7.3 Dynamic Memory Management and Linked Stacks 
74 Implementation of Linked Representations 


7.5 Applications 133 
Summary 137 

Illustrative Problems 137 
Review Questions 148 
Programming Assignments 149 


Trees and Binary Trees 
8.1 Introduction 151 


Part II 


Part III 


39 


56 


84 


123 


151 





The McGraw-Hill Companies 


Contents xvii 


10. 


11. 


12. 


8.2 Trees: Definition and Basic Terminologies 151 
8.3 Representation of Trees 153 

8.4 Binary Trees: Basic Terminologies and Types 155 
8.5 Representation of Binary Trees 156 

8.6 Binary Tree Traversals 158 

8.7 Threaded Binary Trees 167 

8.8 Application 169 

Summary 175 

Illustrative Problems 175 

Review Questions 184 

Programming Assignments 185 


Graphs 186 
9.1 Introduction 186 

9.2 Definitions and Basic Terminologies 187 

9.3 Representations of Graphs 195 

9.4 Graph Traversals 199 

9.5 Applications 203 

Summary 209 

Illustrative Problems 209 

Review Questions 214 

Programming Assignments 216 


Part IV 


Binary Search Trees and AVL Trees 218 
10.1 Introduction 218 

10.2 Binary Search Trees: Definition and Operations 218 

10.3 AVL Trees: Definition and Operations 228 

10.4 Applications 243 

Summary 246 

Illustrative Problems 247 

Review Questions 259 

Programming Assignments 260 


B Trees and Tries 262 
11.1 Introduction 262 

11.2 m-way search trees: Definition and Operations 262 

11.3 B Trees: Definition and Operations 269 

11.4 Tries: Definition and Operations 277 

11.5 Applications 281 

Summary 284 

Illustrative Problems 285 

Review Questions 290 

Programming Assignments 292 


Red-Black Trees and Splay Trees 293 
12.1 Red-Black Trees 293 
12.2 Splay Trees 311 


13. 


14. 


15. 


16. 





The McGraw-Hill Companies 


xviii Contents 


12.3 Applications 318 
Summary 319 

Illustrative Problems 319 
Review Questions 329 
Programming Assignments 330 


Hash Tables 331 
13.1 Introduction 331 

13.2 Hash Table Structure 332 
13.3 Hash Functions 333 

13.4 Linear Open Addressing 334 
13.5 Chaining 339 

13.6 Applications 342 

Summary 346 

Illustrative Problems 347 

Review Questions 351 

Programming Assignments 352 


File Organizations 353 
14.1 Introduction 353 

14.2 Files 354 

14.3 Keys 355 

14.4 Basic File Operations 356 

14.5 Heap or Pile Organization 356 

14.6 Sequential File Organisation 357 

14.7 Indexed Sequential File Organization 358 
14.8 Direct File Organization 363 

Illustrative Problems 365 

Summary 369 

Review Questions 370 

Programming Assignments 371 


Part V 


Searching 373 
15.1 Introduction 373 

15.2 Linear Search 373 

15.3 Transpose Sequential Search 375 
15.4 Interpolation Search 376 

15.5 Binary Search 378 

15.6 Fibonacci Search 381 

15.7 Other Search Techniques 384 
Summary 385 

Illustrative Problems 386 

Review Questions 391 

Programming Assignments 393 


Internal Sorting 394 
16.1 Introduction 394 





The McGraw-Hill Companies 


Contents xix 


17. 


16.2 Bubble Sort 395 
16.3 Insertion Sort 396 
16.4 Selection Sort 399 
16.5 Merge Sort 401 
16.6 Shell Sort 405 
16.7 Quick Sort 410 
16.8 Heap Sort 414 
16.9 Radix Sort 422 
Summary 426 

Illustrative Problems 426 
Review Questions 433 
Programming Assignments 434 


External Sorting 435 
17.1 Introduction 435 

17.2 External Storage Devices 436 

17.3 Sorting with Tapes: Balanced Merge 438 
17.4 Sorting with Disks: Balanced Merge 441 
17.5 Polyphase Merge Sort 445 

17.6 Cascade Merge Sort 447 

Summary 449 

Illustrative Problems 449 

Review Questions 455 

Programming Assignments 456 


Index 457 


The McGraw-Hill Companies 


Visual Walkthrough 





Contents 


4.2 Stack Operations 40 

4.3 Applications 43 

Summary 48 

Illustrative Problems 49 

Review Questions 54 

Programming Assignments 55 

Queues 

5.1 Introduction 56 5 . . . . 

52 Operations on Queues 57 The book is conveniently organized into five 
5.4 Other Types of Queues 66 . . OPEO 

55 Applications 7i parts to favor selection of topics suiting the level 
Illustrative Problems 76 


Review Questions 81 of the (O08 tole offered. 


Programming Assignments 82 


Linked Lists 
Introduction 84 
Singly Linked Lists 87 
Circularly Linked Lists 93 
Doubly Linked Lists 98 

.5 Multiply Linked Lists 103 

6.6 Applications 105 

Summary 112 

Illustrative Problems 113 

Review Questions 119 

Programming Assignments 121 

Linked Stacks and Linked Queues 
Introduction 123 
Operations on Linked Stacks and Linked Queues 124 
Dynamic Memory Management and Linked Stacks 130 
Implementation of Linked Representations 132 
Applications 133 

Summary 137 

Illustrative Problems 137 

Review Questions 148 

Programming Assignments 149 CHAPTER 

Part III S T a a 


Trees and Binary Trees 
8.1 Introduction 151 


8.2 Trees: Definition and Basic Terminologies 151 L I N K E |D) L I S T S 


8.3 Representation of Trees 153 


6.1 Introduction 


6.2 Singly Linked Lists 
In Part I of the book we dealt with arrays, stacks and queues which 6.3 Circularly Linked 
are linear sequential data structures (of these, stacks and queues jy 
have a linked representation as well, which will be discussed in f 
Chapter 7) 6.4 Doubly Linked Lists 

In this chapter we detail linear data structures having a linked 6.5 Multiply Linked 

representation. We first list the demerits of the sequential data Lists 
structure before introducing the need for a linked representation. 
Next, the linked data structures of singly linked list, circularly 
linked list, doubly linked list and multiply linked list are 
elaborately presented. Finally, two problems, viz., Polynomial 
addition and Sparse matrix representation, demonstrating the 
application of linked lists are discussed. 


6.6 Applications 


Each chapter lists the topics covered. 


Introduction 6.1 


Drawbacks of sequential data structures 


Arrays are fundamental sequential data structures. Even stacks and queues rely on arrays for their 
representation and implementation. However, arrays or sequential data structures in general, 
suffer from the following drawbacks: 

(i) inefficient implementation of insertion and deletion operations and 

(ii) inefficient use of storage memory. 

Let us consider an array A[1 : 20]. This means a contiguous set of twenty memory locations 
have been made available to accommodate the data elements of A. As shown in Fig. 6.1(a), let us 
suppose the array is partially full. Now, to insert a new element 108 in the position indicated, it 
is not possible to do so without affecting the neighbouring data elements from their positions. 
Methods such as making use of a temporary array (B) to hold the data elements of A with 108 
inserted at the appropriate position or making use of B to hold the data elements of A which 
follow 108, before copying B into A, call for extensive data movement which is computationally 
expensive. Again, attempting to delete 217 from A calls for the use of a temporary array B to hold 
the elements with 217 excluded, before copying B to A. (Fig. 6.1) 


Q: Chapter-end summary for use as 
ummary : 
quick reference. 


Hash tables are ideal data structures for dictionaries. They favor efficient storage and 
retrieval of data lists which are linear in nature. 

A hash function is a mathematical function which maps keys to positions in the hash tables 
known as buckets. The process of mapping is called hashing. Keys which map to the same 
bucket are called as synonyms. In such a case a collision is said to have occurred. A bucket 
may be divided into slots to accommodate synonyms. When a bucket is full and a synonym 
is unable to find space in the bucket then an overflow is said to have occurred. 

The characteristics of a hash function are that it must be easy to compute and at the same 
time minimize collisions. Folding, truncation and modular arithmetic are some of the 
commonly used hash functions. 

A hash table could be implemented using a sequential data structure such as arrays. In 
such a case, the method of handling overflows where the closest slot that is vacant is 
utilized to accommodate the synonym key is called linear open addressing or linear 
probing. However, in course of time, linear probing can lead to the problem of clustering 
thereby deteriorating the performance of the hash table to a mere sequential search! 
The other alternative methods of handling overflows are rehashing, quadratic probing and 
random probing. 





The McGraw-Hill Companies 


386 Data Structure and Algorithms 


© Illustrative Problems 


Problem 15.1 For the list CHANNELS={ AAXN, ZZEE, , CCNN, DDDN HHBO, GGOD, 
»e sequential search for the 



























External Sorting 


Q) Summary 


> External sorting deals with sorting of files or lists that are too huge to be accommodated 
in the internal memory of the computer and hence need to be stored in external storage 
devices such as disks or drums. 





> The principle behind external sorting is to first make use of any efficient internal sorting 
technique to generate runs. These runs are then merged in passes to obtain a single run at 
which stage the file is deemed sorted. The merge patterns called for by the strategies, are 
influenced by external storage medium on which the runs reside, viz., disks or tapes. 


v 


Magnetic tapes are sequential devices built on the principle of audio tape devices. Data is 
stored in blocks occurring sequentially. Magnetic disks are random access storage devices. 
Data stored in a disk is addressed by its cylinder, track and sector numbers. 


Extensive Illustrative Problems 
> Balanced merge sort is a technique that can be adopted on files residing on both disks and 
tapes. In its general form, a k-way merging could be undertaken during the runs. For the throu ghout 


efficient management of merging runs, buffer handling and selection tree mechanisms are 
employed. 


v 


Balanced k-way merge sort on tapes calls for the use of 2k tapes for an efficient 
management of runs. Polyphase merge sort is a clever strategy that makes use of only (k+1) 
tapes to perform the k -way merge. The distribution of runs on the tapes follows a 
Fibonacci number sequence. 


> Cascade merge sort is yet another smart strategy which unlike polyphase merge sort d 
not employ a uniform merge pattern. Each pass makes use of a ‘cascading’ sg 
merge patterns. 


Illustrative Problems 








Problem 17.1 The specification for a typical disk storage system is shown in Table I 17.1. 
An employee file consisting of 100,000 records is stored on the disk. The employee record structure 
and the size of the fields in bytes (shown in brackets) are given below: 








y 


Employe Employed Designation. Address) Basic pay; Allowances] Deductions} Total salary 
number | 


(6) (20) (10) (30) (6) (20) (20) (6) 
(a) What is the storage space (in terms of bytes) needed to store the employee file in the disk? 
(b) What is the storage space (in term of cylinders) needed to store the employee file in the 
disk? 
Solution: 
(a) The size of the employee record = 118 bytes 
Number of employee records that can be held in a sector = 512/ 118 = 4 records 
Number of sectors needed to hold the whole employee file = 100000/4 = 25,000 sectors 
















Review Questions include objective-type, short- 


: Review Questions 
answer and long-answer type questions. I = 












1. A minimal superkey is in fact a se 
(a) secondary key (b) primary key (c) non key (d) none of these 
2. State whether true or false: 
(i) A cluster index is a sparse index 
(ii) A secondary key field with distinct values yields a dense index 
(a) (i) true (ii) true (b) (i) true (ii) false (c) (i) false (ii) true (d) (i) false (ii) false 
3. An index consisting of variable length entries where each index entry would be of the 
form (K, BT, By? j B,T, aes B,t) where Bt ‘s are block addresses of the various records 
holding the same value for the secondary key K can occur only in 
(a) primary indexing (b) secondary indexing 
(c) cluster indexing (d) multilevel indexing 









ADT for Queues 





Data objects 
A finite set of elements of the same type P 
Operations a | The ADTs for selective data structures 
reate an empty queue and initialize front and rear variables of the queue 
CREATE ( QUEUE, FRONT, REAR) e 
Check if QUEUE i it: t ] t d f 
ee ee eee gers (QUEUE ) (Boolean function) are separa € y presen e Or convenience 
Check if queue QUEUE is full 
CHK_QUEUE_FULL (QUEUE) (Bool f ti ) f f 
Insert ITEM into queue QUEUE A Ha O re erence. 
ENQUEUE (QUEUE, ITEM) 
Delete element from queue QUEUE and output the element deleted in ITEM 
DEQUEVUE (QUEUE , ITEM) 


Programming Assignments are given ©) Programming Assignment 
at the end of each chapter. 


. Write a program to input a binary tree implemented as a linked representation. Execute 
Algorithms 8.1-8.3 to perform inorder, postorder and preorder traversals of the binary tree. 
. Implement Algorithm 8.4 to convert an infix expression into its postfix form. 
. Write a recursive procedure to count the number of nodes in a binary tree. 
. Implement a threaded binary tree. Write procedures to insert anode NEW to the left of node 
NODE when 
(i) the left subtree of NODE is empty, and 
(ii) the left subtree of NODE is non-empty. 





The McGraw-Hill Companies 





Internal Sorting 


Algorithm 16.7: Procedure for Partition 
procedure PARTITION(L, first, last, loc ) 

/* L[first:last] is the list to be partitioned. loc is the 

position where the pivot element finally settles down*/ 
left = first; 
right = lastt1; 
pivot elt = d|[seskresic |e /* set the pivot element to the first 
element in list L*/ 





while (left < right) do 
repeat 
left = left+1; /* pivot element moves left to right*/ 
until LilefE] ə pivot elt; 
repeat 
right = right -1; /* pivot element moves right to left*/ 
until L[right] < pivot elt; 
if (left < right) then swap(L[left], L[right]); /*arrows face each 
other*/ 
end 
loc = right 
swap(L[first], L[right]); /* arrows have crossed each other - exchange 
pivot element L[first] with L[right]*/ 
end PARTITION. 


Example 16.13 Let us quick sort the list Z = {5, 1, 26, 15, 76, 34, 15}. The various phases of 
the sorting process are shown in Fig. 16.8. When the partitioned sublists contain only one element 
then no sorting is done. Also in phase 4 of Fig. 16.8 observe how the pivot element 34 exchanges 
with itself. The final sorted list is {1, 5, 15, 15, 26, 34, 76}. 


Algorithm 16.8: Procedure for Quick Sort 


procedure QUICK SORT(L, first, last ) 
/* L[first:last] is the unordered list of elements to be 
quick sorted. The call to the procedure to sort the 
dist Lfi:n} would be QUICK SORT(L, 1, n)*/ 
(first < last) then 


PART DTION (ii first; last a loc) sa. /* partition the list into two 
sublists at loc*/ 

QUICK SORE (LZ, first; loc =i); /* quick sort the sublist 
L[first,loc-1]*/ 


QUICK SORT(L; loctly last J; /* quick sort the sublist 
L[loc+1, last]*/ 
} 


end QUICK_SORT. 





Stability and performance analysis 


Quick sort is not a stable sort. During the partitioning process keys which are equal are subject 
to exchange and hence undergo changes in their relative orders of occurrence in the sorted list. 


Extensive examples are given to illustrate 
theoretical concepts. 












A DATA STRUCTURES A 


Kats 


ND ALGORITHMS GAVIA 


fi 
i 





ï 
e | 
i 






















Pseudo-code algorithms are given for 
better comprehension 









Queues 67 











Example 5.3 Let JOB be a queue of jobs to be undertaken at a factory shop floor for service 
a machine. Let high (2), medium (1) and low (0) be the priorities accorded to jobs. Let J; (k) 
indicate a job J; to be undertaken with priority k. The implementations of a priority queue to keep 
track of the jobs, using the two methods of implementation discussed above, are illustrated for 
a sample set of job arrivals (insertions) and job service completion (deletion). 

Opening JOB queue: ha) h (1) Ja (0) 

Operations on the JOB queue in the chronological order : 

1. J, (2) arrives 

2. J; (2) arrives 

3. Execute job 

4. Execute job 

5. Execute job 









Implementation of a priority Implementation of a priority Remarks 
queue as a cluster of queues queue by sorting queue elements 
es | 
High priority (2) Initial 

JOB Queue configuration 











Opening 


Medium priority (1) JOB queue 
o am JOB Queue 


hO) hA) 
t : 
Low priority (0) 


JOB Queue 





J3(0) 
i 












1. J4(2) arrives High priority (2) 1. J4(2) arrives 
JOB Queue 

J42) 

t? 

Medium priority (1) 12) O O K Insert J4(2) 

ra JOB Queue t a 
0 z 11) p0) i 
+ 4 
Low priority (0) 


JOB Queue 











(Contd.) 


The Online Learning Centre at 


www.mhhe.com/pai/dsa contains C 


programs for algorithms present 
in the text, Sample Questions with 


Solutions and Web Links. 


The McGraw-Hill Companies 


CHAPTER 


INTRODUCTION 





1.1 History of 


Algorithms 
While looking around and marveling at the technological 1.2 Definition, 
advancements of this world—both within and without, one cannot Structure and 
but perceive the intense and intrinsic association of the disciplines Properties of 
of Science and Engineering and their allied and hybrid counterparts, Algorithms 


with the ubiquitous machines called computers. In fact it is difficult 1.3 


ee l i oO Development of an 
to spot a discipline that has distanced itself from the discipline of 


i Algorithm 
computer science. To quote a few, be it a medical surgery or 
diagnosis performed by robots or doctors on patients half way 1.4 Data structures and 
across the globe, or the launching of space crafts and satellites into Algorithms 
outer space, or forecasting tornadoes and cyclones, or the more HL) A AE 
mundane needs of online reservations of tickets or billing at Definition and 
the food store, or control of washing machines etc. one cannot but Classification 
deem computers to be omnipresent, omnipotent, why even 1.6 Organization of the 
omniscient! (Refer Fig. 1.1.) Pook 

Business 


= 


on 
I 


Agriculture J > Healthcare 





Computer 


Industry | O Transportation 





Space Technology 


Weather N$ ş i 
XY 
Science 


Fig. 1.1 Omnipresence of computers 





The McGraw-Hill Companies 


2 Data Structures and Algorithms 


In short, any discipline that calls for problem-solving using computers, looks up to the 
discipline of computer science for efficient and effective methods and techniques of solutions to 
the problems in their respective fields. From the point of view of problem solving, the discipline 
of computer science could be naively categorized into the following four sub areas notwith- 
standing the overlaps and grey areas amongst themselves: 


e Machines What machines are appropriate or available for the solution of a problem? 
What is the machine configuration — its processing power, memory capacity 
etc., that would be required for the efficient execution of the solution? 

e Languages What is the language or software with which the solution of the problem 
needs to be coded? What are the software constraints that would hamper the 
efficient implementation of the solution? 

e Foundations What is the model of a problem and its solution? What methods need to be 
employed for the efficient design and implementation of the solution? What is 
its performance measure? 

e Technologies What are the technologies that need to be incorporated for the solution of the 
problem? For example, does the solution call for a web based implementation 
or needs activation from mobile devices or calls for hand shaking broadcasting 
devices or merely needs to interact with high end or low end peripheral 
devices? 

Figure 1.2 illustrates the categorization of the discipline of computer science from the point of 
view of problem solving. 

One of the core fields that belongs to the 
foundations of computer science deals with 

the design, analysis and implementation of 

algorithms for the efficient solution of the 

problems concerned. An algorithm may be 

loosely defined as a process, or procedure or 

method or recipe. It is a specific set of rules 

to obtain a definite output from specific 

inputs provided to the problem. Foundations 

The subject of data structures is intrin- 

sically connected with the design and 

implementation of efficient algori-thms. 

Data structures deals with the study of 

methods, techniques and tools to organize or 

structure data. Fig. 1.2 Discipline of computer science from the 

Next, the history, definition, classification, point of view of problem solving 
structure and properties of algorithms are 

discussed. 


Technologies 


Machines 





Languages 





History of Algorithms 1.1 


The word algorithm originates from the Arabic word algorism which is linked to the name of the 
Arabic mathematician Abu Jafar Mohammed Ibn Musa Al Khwarizmi (825 A.D.). Al Khwarizmi 


The McGraw Hill Companies 


Introduction 3 


is considered to be the first algorithm designer for adding numbers represented in the Hindu 
numeral system. The algorithm designed by him and followed till today, calls for summing up 
the digits occurring at specific positions and the previous carry digit, repetitively moving from 
the least significant digit to the most significant digit until the digits have been exhausted. 


Example 1.1 Demonstration of Al Khwarizmi’s algorithm for the addition of 987 and 76: 


987 + 987 + 987 + 

76 =F TO t = 76 + 

Carry 1 Carry 1 

(Carry 1) 3 (Carry 1) 63 1063 


Definition, Structure and Properties of Algorithms 





Definition An algorithm may be defined as a finite sequence of instructions each of which has 
a clear meaning and can be performed with a finite amount of effort in a finite length of time. 


Structure and properties 


An algorithm has the following structure: 
(i) Input step (iv) Repetitive step 
(ii) Assignment step (v) Output step 
(iii) Decision step 


Example 1.2 Consider the demonstration of Al Khwarizmi’s algorithm shown on the addition 
of the numbers 987 and 76 in Example 1.1. In this, the input step considers the two operands 987 
and 76 for addition. The assignment step sets the pair of digits from the two numbers and the 
previous carry digit if it exists, for addition. The decision step decides at each step whether the 
added digits yield a value that is greater than 10 and if so, to generate the appropriate carry digit. 
The repetitive step repeats the process for every pair of digits beginning from the least significant 
digit onwards. The output step releases the output which is 1063. 

An algorithm is endowed with the following properties: 


Finiteness an algorithm must terminate after a finite number of steps. 

Definiteness the steps of the algorithm must be precisely defined or unambiguously specified. 

Generality an algorithm must be generic enough to solve all problems of a particular 
class. 

Effectiveness the operations of the algorithm must be basic enough to be put down on pencil 


and paper. They should not be too complex to warrant writing another algorithm 
for the operation! 
Input-Output the algorithm must have certain initial and precise inputs, and outputs that 
may be generated both at its intermediate and final steps. 
An algorithm does not enforce a language or mode for its expression but only demands 
adherence to its properties. Thus one could even write an algorithm in one’s own expressive way 
to make a cup of hot coffee! However, there is this observation that a cooking recipe that calls for 


The McGraw-Hill Companies 


Data Structures and Algorithms 


= 


instructions such as “add a pinch of salt and pepper’, ‘fry until it turns golden brown’ are “anti- 
algorithmic” for the reason that terms such as ‘a pinch’, ‘golden brown’ are subject to ambiguity 
and hence violate the property of definiteness! 

An algorithm may be represented using pictorial representations such as flow charts. An 
algorithm encoded in a programming language for implementation on a computer is called a 
program. However, there exists a school of thought which distinguishes between a program and 
an algorithm. The claim put forward by them is that programs need not exhibit the property of 
finiteness which algorithms insist upon and quote an operating systems program as a counter 
example. An operating system is supposed to be an ‘infinite’ program which terminates only 
when the system crashes! At all other times other than its execution, it is said to be in the ‘wait’ 
mode! 





Development of an Algorithm i) 


The steps involved in the development of an algorithm are as follows: 


(i) Problem statement (v) Implementation 
(ii) Model formulation (vi) Algorithm analysis 
(iii) Algorithm design (vii) Program testing 
(iv) Algorithm correctness (viii) Documentation 


Once a clear statement of the problem is done, the model for the solution of the problem is to 
be formulated. The next step is to design the algorithm based on the solution model that is 
formulated. It is here that one sees the role of data structures. The right choice of the data 
structure needs to be made at the design stage itself since data structures influence the efficiency 
of the algorithm. Once the correctness of the algorithm is checked and the algorithm 
implemented, the most important step of measuring the performance of the algorithm is done. 
This is what is termed as algorithm analysis. It can be seen how the use of appropriate data 
structures results in a better performance of the algorithm. Finally the program is tested and the 
development ends with proper documentation. 


Data Structures and Algorithms 1.4 


As was detailed in the previous section, the design of an efficient algorithm for the solution of the 
problem calls for the inclusion of appropriate data structures. A clear, unambiguous set of 
instructions following the properties of the algorithm alone does not contribute to the efficiency 
of the solution. It is essential that the data on which the problems need to work on are 
appropriately structured to suit the needs of the problem, thereby contributing to the efficiency of 
the solution. 

For example, let us consider the problem of searching for a telephone number of a person, in 
the telephone directory. It is well known that searching for the telephone number in the directory 
is an easy task since the data is sorted according to the alphabetical order of the subscribers’ 
names. All that the search calls for, is to turn over the pages until one reaches the page that is 
approximately closest to the subscriber’s name and undertake a sequential search in the relevant 
page. Now, what if the telephone directory were to have its data arranged according to the order 
in which the subscriptions for telephones were received. What a mess would it be! One may need 





The McGraw-Hill Companies 


Introduction 5 


to go through the entire directory—name after name, page after page in a sequential fashion until 
the name and the corresponding telephone number are retrieved! 

This is a classic example to illustrate the 
significant role played by data structures 
in the efficiency of algorithms. The 

problem was retrieval of a telephone 

number. The algorithm was a simple 
search for the name in the directory 
and thereby retrieve the corresponding 
telephone number. In the first case since 
the data was appropriately structured 
(sorted according to alphabetical order), 
the search algorithm undertaken turned 
out to be efficient. On the other hand, in 
the second case, when the data was 
unstructured, the search algorithm turned 
out to be crude and hence inefficient. 

For the design of efficient programs and 
for the solution of problems, it is essential 


that algorithm design goes hand in hand _ . | 
with appropriate data structures. (Refer Fig. 1.3 Algorithms and Data structures for effi- 


Fig. 1.3.) cient problem solving using computers 





Problem 
Solving 


Data Structure—Definition and Classification 





Abstract data types 


A data type refers to the type of values that variables in a programming language hold. Thus the 
data types of integer, real, character, Boolean which are inherently provided in programming 
languages are referred to as primitive data types. 

A list of elements is called as a data object. For example, we could have a list of integers or 
list of alphabetical strings as data objects. 

The data objects which comprise the data structure, and their fundamental operations are 
known as Abstract Data Type (ADT). In other words, an ADT is defined as a set of data objects 
D defined over a domain L and supporting a list of operations O. 


Example 1.3 Consider an ADT for the data structure of positive integers called POSITIVE _ 
INTEGER defined over a domain of integers Z*, supporting the operations of addition (ADD), 
subtraction(MINUS) and check if positive (CHECK_POSITIVE). The ADT is defined as follows: 

L= Z, D= {x|x € L}, Q = {ADD, MINUS, CHECK_POSITIVE} 
A descriptive and clear presentation of the ADT is as follows: 


Data objects 
Set of all positive integers D 


DD = (x\|x Ee fi}, 








The McGraw-Hill Companies 


6 Data Structures and Algorithms 


Operations 
e Adda tron Or “POstelve integers INE and INE ) into eRESULe 


ADDE A JUNE pee SINE ede, Sb ee) 
Subtraction of positives integers INTI and INIZ Into RESULT 


SUBTRACT ( INTL, INT2; RESULT) 
Check if a number INT1 is a positive integer 
CHEC ERO TRIVET TINGE (OC ean nE ON) 





An ADT promotes data abstraction and focuses on what a data structure does rather than how 
it does. It is easier to comprehend a data structure by means of its ADT since it helps a designer 
to plan on the implementation of the data objects and its supportive operations in any 
programming language belonging to any paradigm such as procedural or object oriented or 
functional etc. Quite often it may be essential that one data structure calls for other data structures 
for its implementation. For example, the implementation of stack and queue data structures calls 
for their implementation using either arrays or lists. 

While deciding on the ADT of a data structure, a designer may decide on the set of operations 
O that are to be provided, based on the application and accessibility options provided to various 
users making use of the ADT implementation. 

The ADTs for various data structures discussed in the book are presented as box items in the 
respective chapters. 


Classification 


Figure 1.4 illustrates the classification of data structures. The data structures are broadly classified 
as linear data structures and non-linear data structures. Linear data structures are uni- 
dimensional in structure and represent linear lists. These are further classified as sequential and 
linked representations. On the other hand, non-linear data structures are two-dimensional 
representations of data lists. The individual data structures listed under each class have been 
shown in Fig. 1.4. 


Data structures 


Linear Non-linear 
Trees 


Linked lists 
Linked stacks 
Linked queues 







Priority 
queues 


Fig. 1.4 Classification of data structures 





The McGraw Hill Companies 


Introduction 7 


Organization of the book 


The book is divided into five parts. Chapter 1 deals with an introduction to the subject of data 
structures and algorithms. Chapter 2 introduces analysis of algorithms. 

Part I discusses linear data structures and includes three chapters pertaining to sequential 
data structures. Chapters 3, 4 and 5 discuss the data structures of arrays, stacks and queues. 

Part II also discusses linear data structures and incorporates two chapters on linked data 
structures. Chapter 6 discusses linked lists in its entirety and Chapter 7 details linked stacks and 
queues. 

Part III discusses the non-linear data structures of trees and graphs. Chapter 8 discusses trees 
and binary trees and Chapter 9 details on graphs. 

Part IV discusses some of the advanced data structures. Chapter 10 discusses binary search trees 
and AVL trees. Chapter 11 details B trees and tries. Chapter 12 deals with red—black trees and 
splay trees. Chapter 13 discusses hash tables and Chapter 14 describes methods of file 
organizations. 

The ADTs for some of the fundamental data structures discussed in PARTS I, HI, HI and IV 
have been provided towards the end of the appropriate chapters. 

Part V deals with searching and sorting techniques. Chapter 15 discusses searching techniques, 
Chapter 16 details internal sorting methods and Chapter 17 describes external sorting methods. 


O Summary 


> Any discipline in Science and Engineering that calls for solving problems using computers, 
looks up to the discipline of Computer Science for its efficient solution. 

> From the point of view of solving problems, computer science can be naively categorized 
into the four areas of machines, languages, foundations and technologies. 

> The subjects of Algorithms and Data structures fall under the category of foundations. The 
design formulation of algorithms for the solution of problems and the inclusion of 
appropriate data structures for their efficient implementation must progress hand in hand. 

> An Abstract Data Type (ADT) describes the data objects which constitute the data structure 
and the fundamental operations supported on them. 

> Data structures are classified as linear and non linear data structures. Linear data structures 
are further classified as sequential and linked data structures. While arrays, stacks and 
queues are examples of sequential data structures, linked lists, linked stacks and queues are 
examples of linked data structures. 

> The non-linear data structures include trees and graphs 

> The tree data structure includes variants such as binary search trees, AVL trees, B trees, 
Tries, Red Black trees and Splay trees. 





The McGraw-Hill Companies 


CHAPTER 


ANALYSIS OF 
ALGORITHMS 





2.1 Efficiency of 


Algorithms 
In the previous chapter we introduced the discipline of computer 2.2 Apriori Analysis 
science from the perspective of problem solving. It was detailed 2.3 Asymptotic 
how problem solving using computers calls not just for good Notations 


algorithm design but also for the appropriate use of data structures 
to render them efficient. This chapter discusses methods and 
techniques to analyze the efficiency of algorithms. 


2.4 Time Complexity 
of an Algorithm 
using O Notation 


2.5 Polynomial vs 





Efficiency of Algorithms Exponential 
Algorithms 

When there is a problem to be solved it is probable that several 2.6 Average, Best and 
algorithms crop up for its solution and therefore one is at a loss to Worst Case 


know which one is the best. This raises the question of how one 
could decide on which among the algorithms is preferable and 
which among them is the best. 

The performance of algorithms can be measured on the scales of 
time and space. The former would mean looking for the fastest 
algorithm for the problem or that which performs its task in the 
minimum possible time. In this case the performance measure is termed time complexity. The 
time complexity of an algorithm or a program is a function of the running time of the algorithm 
or program. 

In the case of the latter, it would mean looking for an algorithm that consumes or needs limited 
memory space for its execution. The performance measure in such a case is termed space 
complexity. The space complexity of an algorithm or a program is a function of the space needed 
by the algorithm or program to run to completion. However, in this book our discussions would 
emphasize mostly on time complexities of the algorithms presented. 

The time complexity of an algorithm can be computed either by an empirical or theoretical 
approach. 

The empirical or posteriori testing approach calls for implementing the complete algorithms 
and executing them on a computer for various instances of the problem. The time taken by the 
execution of the programs for various instances of the problem are noted and compared. That 
algorithm whose implementation yields the least time, is considered as the best among the 
candidate algorithmic solutions. 


Complexities 


2.7 Analyzing 
Recursive Programs 


The McGraw Hill Companies 


Analysis of Algorithms 9 


The theoretical or apriori approach calls for mathematically determining the resources such as 
time and space needed by the algorithm, as a function of a parameter related to the instances of 
the problem considered. A parameter that is often used is the size of the input instances. For 
example, for the problem of searching for a name in the telephone directory, an apriori approach 
could determine the efficiency of the algorithm used, in terms of the size of the telephone 
directory (i.e.) the number of subscribers listed in the directory. There exist algorithms for various 
classes of problems which make use of the number of basic operations such as additions or 
multiplications or element comparisons, as a parameter to determine their efficiency. 

The disadvantage of posteriori testing is that it is dependent on various other factors such as 
the machine on which the program is executed, the programming language with which it is 
implemented and why, even on the skills of the programmer who writes the program code! On 
the other hand, the advantage of apriori analysis is that it is entirely machine, language and 
program independent. 

The efficiency of a newly discovered algorithm over that of its predecessors can be better 
assessed only when they are tested over large input instance sizes. For smaller to moderate input 
instance sizes it is highly likely that their performances may break even. In the case of posteriori 
testing, practical considerations may permit testing the efficiency of the algorithm only on input 
instances of moderate sizes. On the other hand, apriori analysis permits study of the efficiency 
of algorithms on any input instance of any size. 


Apriori Analysis 2.2 





Let us consider a program statement, for example, x = x + 2 in a sequential programming 
environment. We do not consider any parallelism in the environment. Apriori estimation is 
interested in the following for the computation of efficiency: 
(i) the number of times the statement is executed in the program, known as the frequency count 
of the statement, and 
(ii) the time taken for a single execution of the statement. 

To consider the second factor would render the estimation machine dependent since the time 
taken for the execution of the statement is determined by the machine instruction set, the machine 
configuration, and so on. Hence apriori analysis considers only the first factor and computes the 
efficiency of the program as a function of the total frequency count of the statements comprising 
the program. The estimation of efficiency is restricted to the computation of the total frequency 
count of the program. 

Let us estimate the frequency count of the statement x = x + 2 occurring in the following three 
program segments (A, B, C): 
Program segment A Program segment B Program segment C 


for 7 = 1 to n do 
for x = | to n -do 


X= x+2; 
end 
end 





The frequency count of the statement in the program segment A is 1. In the program segment 
B, the frequency count of the statement is n, since the for loop in which the statement is 
embedded executes n (n 2 1) times. In the program segment C, the statement is executed n? (n 2 1) 
times since the statement is embedded in a nested for loop, executing n times each. 





The McGraw-Hill Companies 


10 Data Structures and Algorithms 


In apriori analysis, the frequency count f; of each statement 7 of the program is computed and 
summed up to obtain the total frequency count T = > £. 


1 
The computation of the total frequency count of the program segments A, B, and C are shown 
in Tables 2.1, 2.2 and 2.3. It is well known that the opening statement of a for loop such as for 
i = low index to up index executes ((up index -low index +1) +1 times and the 
statements within the loop are executed (up_index-low_index)+1 times. In the 


Table 2.1 Total frequency count of program segment A 


aro O 


Program statements 










Total frequency count 





Table 2.2 Total frequency count of program segment B 


A statements = count 


A 


Table 2.3 Total frequency count of program segment C 


——ovoer— statements —“w count 
ae J 1 to n do oe 


+ = 








The McGraw-Hill Companies 


Analysis of Algorithms 11 


case of nested for loops, it is easier to compute the frequency counts of the embedded statements 
making judicious use of the following fundamental mathematical formulae: 


n n n(n+1 n n(n+1)(2n+1 
Sjon Hype OID Ap _meDQn+y 
i=7 = 2 i=1 6 


i 
Observe in Table 2.3 how the frequency count of the statement for k = 1 to n dois computed 
as 





n n 

$ (n-1+1)+1= ¥ (n+1)=(n+1)n 

j=1 = 

The total frequency counts of the program segments A, B and C given by 1, (3n + 1) and 

3n*+3n+1 respectively, are expressed as O(1), O(n) and O(n) respectively. These notations 
mean that the orders of the magnitude of the total frequency counts are proportional to 1, n and 
n? respectively. The notation O has a mathematical definition as discussed in Sec. 2.3. These are 
referred to as the time complexities of the program segments since they are indicative of the 
running times of the program segments. In a similar manner, one could also discuss about the 
space complexities of a program which is the amount of memory they require for their execution 
and its completion. The space complexities can also be expressed in terms of mathematical notations. 


Asymptotic Notations 





Apriori analysis employs the following notations to express the time complexity of algorithms. 
These are termed asymptotic notations since they are meaningful approximations of functions 
that represent the time or space complexity of a program. 


Definition 2.1: f(n) = O(g¢(n)) (read as f of n is “big oh” of g of n), if there exists a positive 


integer ng and a positive number C such that |f(n)| < C\g(n)|, for all n = np. 


Example f(n) e(n) 
16n? + 78n? + 12n n? f(n) = O(n’) 
34n — 90 n f(n) = O(n) 
56 1 f(n) = O(1) 


Here g(n) is the upper bound of the function f(n). 


Definition 2.2: f(n) = Q(g(n)) (read as fof n is omega of g of n), if there exists a positive integer 
ng and a positive number C such that |f(n)| 2 C|g(n)|, for all n = no. 


Example f(n) e(n) 
16n? + 8n? + 2 n? f(n) = Q(n?) 
24n + 9 n f(n) = Q(n) 


Here g(n) is the lower bound of the function f(n). 


Definition 2.3: f(n) = Q(g(n)) (read as f on n is theta of g of n) if there exist two positive 
constants c, and cy, and a positive integer ng such that c,|¢(n)| < f(n)| < co |g (n)| for all n = no 


The McGraw Hill Companies 


12 Data Structures and Algorithms 
Example f(n) e(n) 
28n + 9 n f(n) = O(n) since f(n) > 28n 
and f(n) < 37n for n21 
16n? + 30n — 90 n? f(n) = O(n?) 
7,20 307 ar f(n) = O(2”) 


From the definition it implies that the function g(n) is both an upper bound 
and a lower bound for the function f(n) for all values of n, n 2 nọ. This means 
that f(n) is such that, f(n) = O (g(n)) and f(n) = Q(g (n)). 


Definition 2.4: f(n) = 0(g(n)) (read as f of n is “little oh” of g of n) if f(n) = O(g(n)) and 
f(n) # Q(g(n)). 


Example f(n) e(n) 
18n + 9 n? f(n) =o0(n?) since f(n)=0(n?) and f(n) + Q(n?) 
however, f(n) + O (n). 





Time Complexity of an Algorithm Using O Notation 


O notation is widely used to compute the time complexity of algorithms. It can be gathered from 
its definition (Definition 2.1) that if f(n) = O(g(n)) then g(n) acts as an upper bound for the 
function f(n). f(n) represents the computing time of the algorithm. When we say the time 
complexity of the algorithm is O (ẹ(n)), we mean that its execution takes a time that is no more 
than constant times g(n). Here n is a parameter that characterizes the input and/or output 
instances of the algorithm. 

Algorithms reporting O(1) time complexity indicate constant running time. The time 
complexities of O(n), O(n?) and O(n’) are called linear, quadratic and cubic time complexities 
respectively. O(logn) time complexity is referred to as logarithmic. In general, time complexities 
of the type O(n‘) are called polynomial time complexities. In fact it can be shown that a 
polynomial A(n) =a4,n"+a,,_;n"~1 +... + ayn + ag = O(n") (see Illustrative Problem 2.2). Time 
complexities such as O(2"), O(3"), in general O(k") are called as exponential time complexities. 

Algorithms which report O (log n) time complexity are faster for sufficiently large n, than if 
they had reported O(n). Similarly O (n.log n) is better than O(n’), but not as good as O(n). Some 
of the commonly occurring time complexities in their ascending orders of magnitude are listed 
below: 

O(1) < O(log n) < O(n) < O(n.log n) < O(n’) < O(n?) < O(2n) 





Polynomial Vs Exponential Algorithms 2.5 


If n is the size of the input instance, then the number of operations for polynomial algorithms are 
of the form P (n) where P is a polynomial. In terms of O notation, polynomial algorithms have 
time complexities of the form O(n"), where k is a constant. 

In contrast, in the exponential algorithms the number of operations are of the form k”. In terms 
of O notation, exponential algorithms have time complexities of the form O(k"), where k is a 
constant. 


The McGraw-Hill Companies 





Analysis of Algorithms 13 


It is clear from the inequalities listed above that polynomial algorithms are a lot more efficient 
than exponential algorithms. From Table 2.4 it is seen that exponential algorithms can quickly get 
beyond the capacity of any sophisticated computer due to their rapid growth rate (Refer Fig. 2.1). 
Here, it is assumed that the computer takes 1 microsecond per operation. While the time 
complexity functions of n?, n° can be executed in a reasonable time, one can never hope to finish 
the execution of exponential algorithms even if the fastest computer were to be employed. Thus 
if one were to find an algorithm for a problem that reduces from exponential to polynomial time 
then it is indeed a great accomplishment! 


Table 2.4 Comparison of polynomial and exponential algorithms 


10 20 50 
Time complexity function 


C s [re oere | ame 
i 


Un = Uh 
= = = 


Output of the computing time function 


© 


6 7 8 9 10 ll 
Input size 





1: n 228 3: n.logon 4: logon 


Fig. 2.1 Growth rate of some computing time functions 





Average, Best and Worst Case Complexities 2.6 


The time complexity of an algorithm is dependent on parameters associated with the input/ 
output instances of the problem. Very often the running time of the algorithm is expressed as a 





The McGraw-Hill Companies 


14 Data Structures and Algorithms 


function of the input size. In such a case it is fair enough to presume that larger the input size 
of the problem instances the larger is its running time. But such is not the case always. There are 
problems whose time complexity is dependent not just on the size of the input but on the nature 
of the input as well. Example 2.1 illustrates this point. 


Example 2.1 Algorithm: To sequentially search for the first-occurring even number in the 
list of numbers given. 


Input 1: -1, 3, 5, 7, -5, 7, 11, -13, 17, 71, 21, 9, 3, 1, 5, -23, -29, 33, 35, 37, 40 
Input 2: 6, 17, 71, 21, 9, 3, 1, 5, -23, 3, 64, 7, -5, 7, 11, 33, 35, 37, -3, -7, 11 
Input 3: 71, 21, 9, 3, 1, 5, -23, 3, 11, 33, 36, 37, -3, -7, 11, -5, 7, 11, -13, 17, 22 


Let us determine the efficiency of the algorithm for the input instances presented in terms of 
the number of comparisons done before the first occurring even number is retrieved. Observe that 
all three input instances are of the same size. 

In the case of Input 1, the first occurring even number occurs as the last element in the list. The 
algorithm would require 21 comparisons, equivalent to the size of the list, before it retrieves the 
element. On the other hand, in the case of Input 2 the first occurring even number shows up as 
the very first element of the list thereby calling for only one comparison before it is retrieved! If 
Input 2 is the best possible case that can happen for the quickest execution of the algorithm, then 
Input 1 is the worst possible case that can happen when the algorithm takes the longest possible 
time to complete. Generalizing, the time complexity of the algorithm in the best possible case 
would be expressed as O(1) and in the worst possible case would be expressed as O(n) where 
n is the size of the input. 

This justifies the statement that the running time of algorithms are not just dependent on the 
size of the input but also on its nature. That input instances (or instances) for which the algorithm 
takes the maximum possible time is called the worst case and the time complexity in such a case 
is referred to as the worst case time complexity. That input instances for which the algorithm 
takes the minimum possible time is called the best case and the time complexity in such a case is 
referred to as the best case time complexity. All other input instances which are neither of the two 
are categorized as the average cases and the time complexity of the algorithm in such cases is 
referred to as the average case complexity. Input 3 is an example of an average case since it is 
neither the best case nor the worst case. By and large, analyzing the average case behaviour of 
algorithms is harder and mathematically involved when compared to their worst case and best 
case counterparts. Also such an analysis can be misleading if the input instances are not chosen 
at random or appropriately to cover all possible cases that may arise when the algorithm is put 
to practice. 

Worst case analysis is appropriate when the response time of the algorithm is critical. For 
example, in the case of a nuclear power plant controller, it is critical to know of the maximum 
limit of the system response time regardless of the input instance that is to be handled by the 
system. The algorithms designed cannot have a running time that exceeds this response time 
limit. 

On the other hand in the case of applications where the input instances may be wide and 
varied and there is no knowing beforehand of the kind of input instance that has to be worked 
on, it is prudent to choose algorithms with good average case behaviour. 


The McGraw-Hill Companies 


Analysis of Algorithms 15 





Analyzing Recursive Programs 2.7 


Recursion is an important concept in computer science. Many algorithms can best be described 
in terms of recursion. 


Recursive procedures 


If P is a procedure containing a call statement to itself (Fig. 2.2(a)) or to another procedure that 
results in a call to itself (Fig. 2.2(b)), then the procedure P is said to be a recursive procedure. In 
the former case it is termed direct recursion and in the latter case it is termed indirect recursion. 

Extending the concept to programming can yield program functions or programs themselves 
that are recursively defined. In such cases they are referred to as recursive functions and recursive 
programs respectively. 

Extending the concept to mathematics would yield what are called recurrence relations. 

In order that the recursively defined function may not run into an infinite loop it is essential 
that the following properties are satisfied by any recursive procedure: 


Procedure P 





Procedure P 





(a) Direct recursion (b) Indirect recuirsion 


Fig. 2.2 Skeletal recursive procedures 


(i) There must be criteria, one or more, called the base criteria or simply base case(s), where the 
procedure does not call itself either directly or indirectly. 
(ii) Each time the procedure calls itself directly or indirectly, it must be closer to the base 
criteria. 
Example 2.2 illustrates a recursive procedure and Example 2.3 a recurrence relation. Example 2.4 
describes the Tower of Hanoi puzzle which is a classic example for the application of recursion 
and recurrence relation. 


Example 2.2 A recursive procedure to compute factorial of a number n is shown below: 
n!=1, if n = 1 (base criterion) 
n! =n. (n-1)!, ifn>1 
Note the recursion in the definition of factorial function(!). n! calls (n—1)! for its definition. A 
pseudo-code recursive function for computation of n! is shown below: 





The McGraw-Hill Companies 


16 Data Structures and Algorithms 


function factorial (n) 
L=2. 1f (n = 1) then factorial = 1; 
Or else 


of cacctorial = n~% “kactoriali nl); 
and end factorial. 





Example 2.3 A recurrence relation S (n) is defined as below: 


S(n) = 0, if n = 1 (base criterion) 
= S(n/2) + 1, ifn>1 


Example 2.4 The Tower of Hanoi puzzle 

The Tower of Hanoi puzzle was invented by the French mathematician Edouard Lucas in 1883. 
There are three Pegs, Source (S), Intermediary (I) and Destination (D). Peg S contains a set of 
disks stacked to resemble a tower, with the largest disk at the bottom and the smallest at the top. 
Figure 2.3 illustrates the initial configuration of the Pegs for 6 disks. The objective is to transfer 
the entire tower of disks in Peg S, to Peg D, maintaining the same order of the disks. Also only 
one disk can be moved at a time and never can a larger disk be placed on a smaller disk during 
the transfer. The I Peg is for intermediate use during the transfer. 

A simple solution to the problem, for N = 3 disk is given by the following transfers of disks: 


1. Transfer disk from Peg S to Peg D 
Transfer disk from Peg S to Peg I 

Transfer disk from Peg D to Peg I 
Transfer disk from Peg S to Peg D 
Transfer disk from Peg I to Peg S 

Transfer disk from Peg I to Peg D 
Transfer disk from Peg S to Peg D 


DLEE ee 


Peg S Peg 7 Peg D 
Fig. 2.3 Tower of Hanoi puzzle (initial configuration) 


The solution to the puzzle calls for an application of recursive functions and recurrence relations. 
A skeletal recursive procedure for the solution of the problem for N number of disks, is as 
follows: 

1. Move the top N-1 disks from Peg S to Peg I (using D as an intermediary Peg) 

2. Move the bottom disk from Peg S to Peg D 

3. Move N-1 disks from Peg I to Peg D (using Peg S as an intermediary Peg) 
A pictorial representation of the skeletal recursive procedure for N = 6 disks is shown in Fig. 2.4. 
Function TRANSFER illustrates the recursive function for the solution of the problem. 


The McGraw-Hill Companies 





Analysis of Algorithms 17 


LA tl 


Peg S Peg / Peg D 


Move the top (N-1) disks (5 disks) from 
Peg S to Peg / using Peg D as the intermediate Peg 


dad 


Peg / Peg D 
Move the bottom disk from Peg S to Peg D 


tot s&s 


Peg S Peg / Peg D 
Move (N-1) disks from Peg / to Peg D using Peg S as the intermediate Peg 


Fig. 2.4 Pictorial representation of the skeletal recursive procedure for Tower of Hanoi puzzle 


EuneCtZony LRANSERRUN, 9), ep) oD) 

/* N disks are to be transferred from peg S to peg D with 
peg © as the “intermediate peg*7 

if N is 0 then exit(); 

else 

CIRAN FERINA ID IE) ee eiaeugicsceia INS IEY Cb Siigsr Saicelin a 13 Ee) 


peg I with peg D as the intermediate peg*/ 

Franster disk from S Eo D /* move the diok which Aas the last 
and the largest disk, from peg S to peg D*/ 

TRAN IMIR (N= MI? Sy ID a a e INS Cll a iio o a a 
peg D with peg S as the intermediate peg*/} 

end TRANSFER. 





Apriori analysis of recursive functions 


The apriori analysis of recursive functions is different from that of iterative functions. In the latter 
case as was seen in Sec. 2.2, the total frequency count of the programs were computed before 
approximating them using mathematical functions such as O. In the case of recursive functions 
we first formulate recurrence relations that define the behaviour of the function. The solution of 
the recurrence relation and its approximation using the conventional O or any other notation 
yields the resulting time complexity of the program. 





The McGraw-Hill Companies 


18 Data Structures and Algorithms 


To frame the recurrence relation, we associate an unknown time function T(n) where n 
measures the size of the arguments to the procedure. We then get a recurrence relation for T(n) 
in terms of T(k) for various values of k. 

Example 2.5 illustrates obtaining the recurrence relation for the recursive factorial function 
FACTORIAL (n) shown in Example 2.2. 


Example 2.5 Let T(n) be the running time of the recursive function FACTORIAL (n). The 
running times of lines 1 and 2 is O(1). The running time for line 3 is given by O(1) + T(n — 1). 
Here T(n — 1) is the time complexity of the call to the recursive function FACTORIAL (n-1). Thus 
for some constants c, d, 

T(n) =c+T(n- 1), ifn>1 
= qd, ifn<1 
Example 2.6 derives the recurrence relation for the Tower of Hanoi puzzle. 


Example 2.6 The recurrence relation for the Tower of Hanoi puzzle is derived as follows: 
Let T(N) be the minimum number of transfers that are needed to solve the puzzle with N disks. 
From the function TRANSFER it is evident that for N = 0, no disks are transferred. Again for N > 
0, two recursive calls each enabling the transfer of (N — 1) disks, and a single transfer of the last 
(largest) disk from peg S to peg D are done. Thus the recurrence relation is given by, 

T(N) = 0, if N = 0 
=2.T(N-1)+1, ifN>0 
Now what remains to be done is to solve the recurrence relation, in other words to solve for T(n). 
Such a solution where T(n) expresses itself in a form where no T occurs on the right side is termed 
as a closed form solution, in conventional mathematics. 

The general method of solution is to repeatedly replace terms T(k) occurring on the right side 
of the recurrence relation, by the relation itself with appropriate change of parameters. The 
substitutions continue until one reaches a formula in which T does not appear on the right side. 
Quite often at this stage, it may be essential to sum a series which could be either an arithmetic 
progression or a geometric progression or some such series. Even if we cannot obtain a sum 
exactly, we could work to obtain at least a close upper bound on the sum, which could serve to 
act as an upper bound for T(n). 

Example 2.7 illustrates the solution of the recurrence relation for the function FACTORIAL (n), 
discussed in Example 2.5 and Example 2.8 illustrates the solution of the recurrence relation for 
the Tower of Hanoi puzzle, discussed in Example 2.6. 


Example 2.7 Solution of the recurrence relation 


T(n)=c+T(n- 1), rus d 
=d, fusi 
yields the following steps. 
T(n)=c+ T(n- 1) ...(step 1) 
=¢ + (c+ T(n- 2) 
= 2c + T(n - 2) ...(step 2) 


= 2c + (c + T(n — 3) 
= 3c + T(n - 3) ...(step 3) 





The McGraw-Hill Companies 


Analysis of Algorithms 19 


In the kth step the recurrence relation is transformed as 


T(n)=k-c+T(n-k), if n>k, ...(step k) 
Finally when (k = n — 1), we obtain 
T(n) = (n -1)-c+ TQ), ...(step n — 1) 
=(n-1)c+d 
= O(n) 


Observe how the recursive terms in the recurrence relation are replaced so as to move the 
relation closer to the base criterion viz., T(n) = 1, n < 1. The approximation of the closed form 
solution obtained viz., T(n) = (n — 1)c + d yields O(n). 


Example 2.8 Solution of the recurrence relation for the Tower of Hanoi puzzle, 


T(N) = 0, if N =0 
=2-T(N-1)+1, if N > 0 
yields the following steps. 


T(N) = 2 - T(N- 1) ...(step 1) 
=2. (2.I(N - 2) + 1) + 1) 
= 22? T(N -2)+2+1 ...(step 2) 
= 2412. T(N —3) +1) +2+1 
=23.T(N-3)+274+24+ 72 922 2 2 2 2 ae (step 3) 
In the kth step the recurrence relation is transformed as 
T(N) = 2kT(N — k) + 24-2) 4+ 26-2) +... 234224241, . (step k) 
Finally when (k = N), we obtain 
T(N) = 2NT(0) + 2N-1 + 2N-2) +... 234+ 224241 ...(step N) 
=2N.0 + (2N - 1) 
=2N_] 
= O(2N) 





O Summary 


> When several algorithms can be designed for the solution of a problem, there arises the 
need to determine which among them is the best. The efficiency of a program or an 
algorithm is measured by computing its time and/or space complexities. The time 
complexity of an algorithm is a function of the running time of the algorithm and the space 
complexity is a function of the space required by it to run to completion. 

> The time complexity of an algorithm can be measured using Apriori analysis or Posteriori 
testing. While the former is a theoretical approach that is general and machine 
independent, the latter is completely machine dependent. 


20 


> 





The McGraw-Hill Companies 


Data Structures and Algorithms 


The apriori analysis computes the time complexity as a function of the total frequency 
count of the algorithm. Frequency count is the number of times a statement is executed in 
a program. 

O, Q, O, and o are asymptotic notations that are used to express the time complexity of 
algorithms. While O serves as the upper bound of the performance measure, Q serves as 
the lower bound. 

The efficiency of algorithms is not just dependent on the input size but is also dependent 
on the nature of the input. This results in the categorization of worst, best and average case 
complexities. Worst case time complexity is that input instance(s) for which the algorithm 
reports the maximum possible time and best case time complexity is that for which it 
reports the minimum possible time. 

Polynomial algorithms are highly efficient when compared to exponential algorithms. The 
latter due to their rapid growth rate can quickly get beyond the computational capacity of 
any sophisticated computer. 

Apriori analysis of recursive algorithms calls for the formulation of recurrence relations 
and obtaining their closed form solutions, before expressing them using appropriate 
asymptotic notations. 


©» Illustrative Problems 


Problem 2.1 If T,(n) and T,(n) are the time complexities of two program fragments P} and 
P, where T,(1) = O(f(n)) and T,(n) = O(g(n)), find T,(n) + Tp(n), and T,(n) . T>(n). 


Solution: Since T,(n) < c - f(n) for some positive number c and positive integer 1, such that 
n2n, and T,(n) < d - g(n) for some positive number d and positive integer n, such that n 2 n,, 
we obtain T(n) + T,(n) as follows: 


(i.e.) 


T,(n) + T,(n) <c - f(n) + d - g(n), for n > ng where ng = max(n,, n) 
T,(n) + T (n) < (c + d) max( f(n), g(n)) for n > ng 


Hence T(n) + T (n) = O(max(f(n), g(n))). 


(This result is referred to as Rule of Sums of O notation) 
To obtain T,(1) - T,(n), we proceed as follows: 


T(n) - T(n) < c - f(n) - d - g(n) 
< k - f(n) - g(n) 


Therefore, T(n) - T,(n) = O (f(n) - 9(n)) 
(This result is referred to as Rule of Products of O notation) 


Problem 2.2 If A(n)=a,n"+a,,_yn"™-1+...+a,n + a then A(n) = O(n") for n = 1. 


Solution: Let us consider |A(n)|. We have, 


Aln) =la n" +a, n”"-1+..+an+ta 
m m- 1 1 0 





The McGraw-Hill Companies 


Analysis of Algorithms 21 


SG, 0S TE | a nl |G 

S (lA) + [Ay 4] + ++ [ay] + laol) © n” 

<C: n” where c = |a,,| + |2,,_4| + --- |a,| + lao 
Hence A(n) = O(n"). 


Problem 2.3 Two algorithms A and B report time complexities expressed by the functions 
n? and 2” respectively. They are to be executed on a machine M which consumes 10~ seconds to 
execute an instruction. What is the time taken by the algorithms to complete their execution on 
machine A for an input size of 50? If another machine N, which is 10 times faster than machine 
M is offered for the execution, what is the largest input size that can be handled by the two 
algorithms on machine N? What are your observations? 


Solution: Algorithms A and B report a time complexity of n? and 2” respectively. In other words 
each of the algorithms execute approximately n? and 2” instructions respectively. For an input size 
of n = 50 and with a speed of 10 seconds per instruction, the time taken by the algorithms on 
machine M are as follows: 

Algorithm A: 50? x 10% = 0.0025 sec 

Algorithm B: 250 x 10® = 35 years 
If another machine N which is 10 times faster than machine M is offered, then the number of 
instructions that algorithms A and B can execute on machine M would also be 10 times more than 
that on M. Let x? and 2Y be the number of instructions that algorithms A and B execute on the 


machine N. Then the new input size that each of these algorithms can handle is given by 
Algorithm A: x= 10 x n? 


x= f10xn=3.n 


That is, algorithm A can handle 3 times the original input size that it could 
handle on machine M. 
Algorithm B: 2%=10 x 2” 
y=log,2+n=3t+n 


That is, algorithm B can handle just 3 units more than the original input size that 
it could handle on machine M. 

Observations: Since algorithm A is a polynomial algorithm, it displays a superior performance 
of executing the specified input on machine M in 0.0025 secs. Also when offered 
a faster machine N, it is able to handle 3 times the original input size that it could 
handle on machine M. 

In contrast, algorithm B is an exponential algorithm. While it takes 35 years to 
process the specified input on machine M, despite the faster machine offered, it 
is able to process just 3 more over the input data size that it could handle on 
machine M. 


Problem 2.4 Analyze the behaviour of the following program which computes the nth 
Fibonacci number, for appropriate values of n. Obtain the frequency count of the statements (that 
are given line numbers) for various cases of n. 





The McGraw-Hill Companies 


22 Data Structures and Algorithms 


procedure Fibonacci (n) 


ilies read (n); 
2=4 if (n < ©) then print (“error”); exit ( ); 
5-7; if (n = 0) then print (“ Fibonacci number is 0); 
exit ( ); 
8=10., if (n — 1) then print (° Fibonacci number 1s T1); 
exit ( ); 
eili. fl = O; 
rZ=l> 
LS for i = 2 to n do 
LAa=15. aoe oe eee 
Pi? <= Tae oe 
a= sey 
T end 


end Fibonacci 


US). PEINER EET ON eC EnO ie Is pie) y 


Solution: The behaviour of the program can be analyzed for the cases as shown in Table I 2.4. 
Table | 2.4 


Line number Frequency count of the statements 


= W241 


— <= io 


Total e C +3 
count 


Problem 2.5 Obtain the time complexity of the following program: 





procedure whirlpool (m) 
begin 





The McGraw Hill Companies 


Analysis of Algorithms 23 


if (m < 0) then print (“eddy!”); exit(); 

else { 
Swirl = whirlpool(m =- 1) + whirlpool(m = 1); 
print (“whirl”); 
end whirlpool 


Solution: We first obtain the recurrence relation for the the time complexity of the procedure 
whirlpool. Let T (m) be the time complexity of the procedure. The recurrence relation is formulated 
as given below: 


T(m) =a, if m < 0 
= 2T(m — 1) + b, if m > 0. 
Here 2.T (m — 1) expresses the total time complexity of the two calls to whirlpool(m - 1).a, b 


indicate the constant time complexities to execute the rest of the statements when m < 0 and 
m > 0 respectively. 
Solving for the recurrence relation yields the following steps: 


T(m)=2+T(m-—1) +b (step 1) 
= 2(2T(m — 2) + b)+ b 
= 2?T(m — 2) + b(1 + 2) ...(step 2) 
= 22(2.T(m — 3) + b) + 3.b 
= 23(T(m — 3) + b(1 + 2 + 2?) ...(step 3) 
Generalizing, in the i" step 
T(m) = 2'T(m — i) + b(1 + 2+ 22 +....24) ...(step i) 


When i = m, 
T(m) = TO) + b(1 + 2+ 22 +....2) 
=a- 2m + h(2m+1_ 1) 
=k-2™ + | where k, l are positive constants 
= O (2”) 
The time complexity of procedure whirlpool is therefore O (2). 


Problem 2.6 The frequency count of line 3 in the following program fragment is 


4n? —2n ee) (i? —3i) (41° —6n) 
(a) 5 (b) 7 () —Z (d) 5 

1. i=2n 

2. for 7 = 1 to i 

3 for k = 3 to j 

Am =m + 1; 

5. end 

6. end 
i 2n 4n? -2n 

Solution: The frequency count of line 3 is given by È (j-3+1)+1= 2-1) n 

a. jal 


Hence the correct option is a. 


Problem 2.7 Find the frequency count and the time complexity of the following program 
fragment: 
1. for i = 20 to 30 
2. £66 j 1 to n 





The McGraw-Hill Companies 


24 Data Structures and Algorithms 


3. am = am + 1; 
4. end 
5. end 


Solution: The frequency count of the program fragment is shown in Table I 2.7 


Table | 2.7 


1 


30 
X (n+1)=11(n +1) 





The total frequency count is 33n + 34 and time complexity is therefore O(n). 


Problem 2.8 State which of the following are true or false: 
(i) f(n) = 30n?2” + 6n2” + 8n? = O(2") 

Gi) g(n) = 9.2" + n2 = Q(2") 

(iii) h(n) = 9.2" + n2 = @(2") 


Solution: 
(i) False. 
For f(n) = O(2"), it is essential that 
f) s c- |2" 
| 30n* 2" + 6n2" + 8n* 
(i.e.) SEES PEE <c 
ou 


This is not possible since the left-hand side is an increasing function. 
(ii) ‘True. 
(iii) True. 


Problem 2.9 Solve the following recurrence relation assuming n = 2k: 
C(n) = 2, n = 2 
= 2 . C(n/2)+3,n>2 


Solution: The solution of the recurrence relation proceeds as given below: 


C(n)= 2 - C(n/2) +3 ...(step 1) 
= 2(*C(n/4) +3 +3 
= 22C(n/22) + 3 - (1 + 2) ...(step 2) 


= 222 - C(n/23) + 3) +3- (1 +2) 
= 28C(n/23) + 3(1 + 2 + 22) ...(step 3) 





The McGraw-Hill Companies 


Analysis of Algorithms 25 


In the it step, 


C(n) = 2'C (n/2') + 3(1 + 2 + 22 +... + 21-1) ...(step îi) 
Since n = 2%, in the step when i = (k — 1), 
C(n)=2k-1 C(n/2k-1) + 3(1 +2 + 22+... + 2k - 2) ...(step k — 1) 
= 5 C(2) + 3(2*-1 _ 1) 
n n 
=. | 
2 2+3(5 | 
=5.1_3 
2 


Hence C(n) = 5 - n/2 - 3. 


(Q) Review Questions 


1. The frequency count of the statement “for k = 3 to (m + 2) do” is 
(a) (m + 2) (b) (m - 1) (c) (m + 1) (d) (m + 5) 
2. If functions f(n) and g(n), for a positive integer ng and a positive number C, are such that 
f(n) 2 C\g(n)|, for all n = ng, then 
(a) f(n) = Q6) ©) ft) =O(8™)) © f(y =O) (d) flr) = o) 
3. For T(n) = 167n° + 12n* + 89n? + 9n2 +n +1, 
(a) T(n) = O(n) (b) T(n) = O(m) (c) T(n) = O0) (d) T(n) = O(n? + n) 
4. State whether true or false: 
(i) Exponential functions have rapid growth rates when compared to polynomial 
functions 
(ii) Therefore, exponential time algorithms run faster than polynomial time algorithms 
(a) (i) true (ii) true (b) (i) true (ii) false (c) (i) false (ii) false (d) (i) false (ii) true 
5. Find the odd one out: O(n), O(n?), O(n), O(3") 
(a) O(n) (b) Oln?) (c) O(n’) (d) O(3") 
How does one measure the efficiency of algorithms? 
Distinguish between best, worst and average case complexities of an algorithm. 
Define O and Q notations of time complexity. 
Compare and contrast exponential time complexity with polynomial time complexity. 
How are recursive programs analyzed? 
11. Analyze the time complexity of the following program: 


DLNA 


for send = 1 to n do 
for receive = 1 to send do 
for ack = 2 to receive do 
message = send - (receive + ack); 
end 
end 
end 
12. Solve the recurrence relation: 
S(n)=2-S(n-1)+b-n, ifn>1 


=a, ifn=1 


The McGraw-Hill Companies 


CHAPTER 


ARRAYS 





3.1 Introduction 


3.2 Array Operations 





Introduction 3.1 3.3 Number of 
Elements in an 
In Chapter 1, an Abstract Data Type (ADT) was defined to be a set ay 
of data objects and the fundamental operations that can be 3.4 Representation of 
performed on this set. In this regard, an array is an ADT whose Arrays in 
objects are sequence of elements of the same type and the two Memory 
operations performed on it are store and retrieve. Thus if a is an 3.5 Applications 


array the operations can be represented as STORE (a, i, e) and 
RETRIEVE (a, 1) where 7 is termed as the index and e is the element 
that is to be stored in the array. These functions are equivalent to 
the programming language statements a[i ]: = e and a[i ] where i is 
termed subscript and a the array variable name in programming 
language parlance. 

Arrays could be of one-dimension, two dimension, three-dimension or in general multi- 
dimension. Figure 3.1 illustrates a one and two dimensional array. It may be observed that while 
one-dimensional arrays are mathematically likened to vectors, two-dimensional arrays are likened 
to matrices. In this regard, two-dimensional arrays also have the terminologies of rows and 
columns associated with them. 


1 2 
1—6 4 
va GEEET] msa $: 
37 -5 
(a) One-dimension (b) Two-dimension 


Fig. 3.1 Examples of arrays 


In Fig. 3.1, A[1:5] refers to a one-dimensional array where 1, 5 are referred to as the lower and 
upper indexes or the lower and upper bounds of the index range respectively. Similarly, B[1:3, 1:2] 
refers to a two-dimensional array with 1, 3 and 1, 2 being the lower and upper indexes of the 
rows and columns respectively. 


The McGraw Hill Companies 


Arrays 27 


Also, each element of the array viz., A[i ] or B[i, j ] resides in a memory location also called a 
cell. Here cell refers to a unit of memory and is machine dependent. 





Array Operations 3.2 


An array when viewed as a data structure supports only two operations viz., 
(i) storage of values (i.e.) writing into an array (STORE (a, i, e) ) and, 
(ii) retrieval of values (i.e.) reading from an array ( RETRIEVE (a, i) ) 
For example, if A is an array of 5 elements then Fig. 3.2 illustrates the operations performed 
on A. 


OBJECT REPRESENTATION IN OPERATIONS RESULT OF THE 
MEMORY OPERATIONS 
O1 [2] [3] [4] [5] [1] [2] BI [A [5] 
RETRIEVE (A, 2) —4 


Fig. 3.2 Array operations: Store and Retrieve 


Number of Elements in an Array 3.3 


In this section, the computation of size of the array by way of number of elements is discussed. 
This is important because, when arrays are declared in a program, it is essential that the number 
of memory locations needed by the array are ‘booked’ before hand. 


One-dimensional array 


Let A[1:u] be a one-dimensional array. The size of the array, as is evident is u and the elements 
are A[1], A[2], ... A[u — 1], A[u]. In the case of the array A[l : u] where | is the lower bound and 
u is the upper bound of the index range, the number of elements is given by (u — l + 1). 


Example 3.1 The number of elements in 
(i) A[1:26] = 26 

Gi) A[5:53] = 49 ( 53-5 +1) 

Gii) A[-1:26] = 28 


Two-dimensional array 


Let A[1 : u, 1 : u,] be a two-dimensional array where u indicates the number of rows and u, the 
number of columns in the array. Then the number of elements in A is u4.u, Generalizing, A[l, : 
tW, l, : Up] has a size of (u —1, + 1) (u, - l, + 1) elements. Figure 3.3 illustrates a two dimensional 
array and its size. 


The McGraw-Hill Companies 


28 Data Structures and Algorithms 
Representation 
<— u, columns ¢— 
| Afl,1]) A[l,2] ... Af], u] 
Array: A[2,1] A[2,2] ... A[2,u,]]| No. of elements 
i , u, rOWs l l 
A[l : u, 1: u] in Á : uj: u 


Alu, 1] Alu,2] ... Alu, u] 


Fig. 3.3 Size of a two-dimensional array 


Example 3.2 The number of elements in 
(i) A[1:10, 1:5]=10x5 = 50 

Gi) A[-1:2, 2:6] = 4 x 5 = 20 

Gii) A[0:5, -1:6] = 6 x 8 = 48 


Multi-dimensional array 


n 
A multi-dimensional array A[1 : u, 1 : u», ... 1 : u, ] has a size of u4 - u, ... u,, elements, (i.e.) [] u;. 
j=l 


Figure 3.4 illustrates a three-dimensional array and its size. Generalizing, the array A[l, : u, h 


n 
: U», l3 : Ug... l, : u„] has a size of LI Cr - l; +1) elements. 
i= 


Array: Elements Number of elements 
A[1:2 1:2 1:3] A[l, 1, 1] A[1, 1, 2] AI, 1, 3] 
A[1, 2, 1] A[1, 2, 2] A[1, 2, 3] E 
A(2, 1, 1] A[2, 1, 2] A[2, 1, 3] a = 
A[2, 2, 1] A[2, 2, 2] A[2, 2, 3] 


Fig. 3.4 Size of a three-dimensional array 


Example 3.3 The number of elements in 
(i) Al-1:3,3:4,2:6]=(@-(C1)+1)4-34 16-241) =50 
(ii) A[O:2,1:2,3:4,-1:2]}=3x2x2x4=48 


Representation of Arrays in Memory 





How are arrays represented in memory? This is an important question at least from the 
compiler’s point of view. In many programming languages the name of the array is associated 
with the address of the starting memory location so as to facilitate efficient storage and retrieval. 
Also it is to be remembered that while the computer memory is considered one-dimensional 
(linear) it has to accommodate arrays which are multi-dimensional. Hence address calculation to 
determine the appropriate locations in the memory becomes important. 





The McGraw Hill Companies 


Arrays 29 


In this respect, it is convenient to A[1:u1, 1:u2] 


imagine a two-dimensional array A[1 : iad a a i mm mime ro > 
u,, 1: uy] as u} number of one-dimen- A[l:u2] Oo 
sional arrays whose dimension is Up. 
Again, in the case of three-dimensional | | 
arrays A[1 : u, 1: u, 1: Us] it can be | i 
viewed as u number of two-dimensional fT | 
arrays of size U,.U3. Figure 3.5 illustrates 
this idea. Generalizing, a multi-dimen- (a)Two-dimensional array viewed in 
sional array A[1: My li Ue oe 1 | IS terms of one-dimensional arrays 
colony of u} number of arrays each of 
dimension A[1 : u, 1: us, ... 1: u,] 

The arrays are stored in the memory eye A[lzuo, 1:03] 





A| l:u1, liu, Lug | 





in one of the two ways, viz., row major oe rt — 

order or lexicographic order or column i peee fi 

major order. In the ensuing discussion EA a—__> 
we assume a row major order represen- "2 
tation. Figure 3.6 distinguishes between (b)Three-dimensional array viewed in 


the two methods of representation. terms of two-dimensional arrays 


Fig. 3.5 Viewing higher-dimensional arrays in terms 
One-dimensional array of their lower-dimensional counter parts 


Consider the array A(1 : u,) and let œ be the address of the starting memory location referred to 
as the base address of the array. Here as is evident, A[1] occupies the memory location whose 
address is œ, A(2) occupies œ + 1 and so on. In general, the address of A[i] is given by a+ (i — 1). 
Figure 3.7 illustrates the representation of a one-dimensional array in memory. In general, for a 
one-dimensional array A(l, : 14) the address of A[i] is given by a + (i — 1,), where @ is the base 
address. 


Example 3.4 For the array given below with base address w= 100, the addresses of the array 
elements specified are computed as given below: 


Array Element Address 
(i) A[1:17] A[7] a+ (7 — 1) = 100 + 6 = 106 
(ii) A[-2:23] A[16] œ + (16 — (-2)) = 100 + 18 = 118 


Two-dimensional array 


Consider the array A[1 : u,, 1 : uy] which is to be stored in the memory. It is helpful to imagine 
this array as u, number of one-dimensional arrays of length u,. Thus if A[1, 1] is stored in address 
a, the base address, then A[i, 1] has address œ + (i — 1)u,, and A[i, j | has address @ + (i — 1)u, + 
(j — 1). To understand this let us imagine the two-dimensional array A[i, j ] to be a building with 
i floors each accommodating j rooms. To access room Al[i, 1], the first room in the ith floor, one 
has to traverse (i — 1) floors each having u, rooms. In other words, (i — 1). u, rooms have to be 





The McGraw-Hill Companies 


30 Data Structures and Algorithms 


Array : A[ 1:1, Liu] 
1 2 - #2 


P| ee 





Row-1 Row-2 Row u] 
Uy 
(a) Row major order 
Array : A[ 1:u1, 1:u2] 
1 2 a u? 
l 
2 
3 
uj 
Memory 
<—Uuj—> <i —> <—uj—> 
Column 1 Column 2 Column u> 


(b) Column major order 


Fig. 3.6 Row major order and column major order of a two-dimensional array 


Array : Memory: 
A{ hci] 
a+1 at+2 a +(u;—1) 


OANE 


Fig. 3.7 Representation of one-dimensional arrays in memory 


left behind before one knocks at the first room in the i floor. Since a is the base address, the 
address of A[i, 1] would be a@ + (i — 1)u,. Again, extending a similar argument to access A[i, j ], 
the j} room on the i" floor, one has to leave behind (i — 1)u rooms and reach the jt room of the 
ith floor. This again as before, would compute the address of Ali, j] as œ + (i — 1)u, + (j - 1). 
Figure 3.8 illustrates the representation of two-dimensional arrays in the memory. 

Observe that the addresses of array elements are expressed in terms of the cells, which hold 
the array. 

In general, for a two-dimensional array A[l, : u, l, : u,] the address of A[i, j] is given by 


a+ (=l) = l+ 1)+(] =l) 





The McGraw-Hill Companies 


Arrays 31 
hn hH 
a+ (i —l)uz 
Q+ (u2— 1) + (j-1) — h 





E A Row- Rowdies” — oa -Uy 


Fig. 3.8 Representation of a two-dimensional array in memory 


Example 3.5 For the arrays given below with a = 220 as the base address, the addresses of 
the elements specified, are computed as given below: 


Array Element Address 
ALO e5] A[8, 3] 220 + (8 — 1)5 + (3 — 1) = 257 
A[-2 : 4, -6 : 10] A[3, -5] 220 + (3 — (-2))(10 — (-6) + 1) + (5 - (-6)) = 306 


Three-dimensional array 


Consider the three-dimensional array A[1 : u,, 1: uy, 1: u3]. As discussed before, we shall imagine 
it to be u} number of two-dimensional arrays of dimension u,.u,. Reverting to the analogy of 
building-floor-rooms, the three dimensional array Al[i, j, k] could be viewed as a colony of 1 
buildings each having j floors with each floor accommodating k rooms. 

To access A[i, 1, 1], (i.e.) the first room in the first floor of the it" building, one has to walk past 
(i — 1) buildings each comprising u,u, rooms, before climbing on to the first floor of the it 
building to reach the first room! This means the address of A[i, 1, 1] would be @ + (i — 1)up.u. 
Similarly the address of Ali, j, 1] requires accessing the first room on the j} floor of the it 
building which works out to @ + (i — 1)u,u, + (j — 1)u}. Proceeding on similar lines, the address 
of A[i, j, k] is given by æ + (i — 1) uu, + (j — 1) Uy + (k - 1). 

Figure 3.9 illustrates the representation of three-dimensional arrays in the memory. 


A[l:uy, liu, Liuz | 


- OGO =n 


A[ 1, upu A[2, uzu3 [i, U2U3 [u], u2u3] 
3] l» 


~ 


<5 


$$ $$$ $$$ 445 
u; number of 


two-dimensional 
arrays 


View of the ith 
two-dimensional 
array slice 





Fig. 3.9 Representation of three-dimensional arrays in the memory 


The McGraw-Hill Companies 


32 Data Structures and Algorithms 


In general for a three-dimensional array A[} : u4, l : Uy, l} : Uz] the address of A[i, j, k] is given 
by 
a+ (i — l)u — l + 1)(u3 — l3 + 1) + (j - l3)(u3 - l3 + 1) + (k - l3) 


Example 3.6 For the arrays given below with base address œ = 110 the addresses of the 
elements specified are as given below: 


Array Element Address 
ASIa 1] AIZ-1-3] 110 + (2 — 1)6 + (1 -1)3 + (3 — 1) = 118 
A[-2 : 4, —6 : 10,1 :3] AJ[-1, -4, 2] 110+ (-1 - (-2)).17.3 + (—4 - (-6))3 + (2 - 1) = 168 


N-dimensional array 


Let A[1 : u, 1: u, ... 1 : uy] be an N-dimensional array. The address calculation for the retrieval 
of various elements are as given below: 


Element Address 
Al tp Ly ly asa: d] æ + (1, — l)u, + Uy + (i — 1) Ug: Uy... UN 


Applications 3.5 





In this section, we introduce two concepts that are useful to computer science and also serve as 
applications of arrays viz., Sparse matrices and ordered lists. 


Sparse matrix 


A matrix is a mathematical object which finds its applications in various scientific problems. A 
matrix is an arrangement of m.n elements arranged as m rows and n columns. The Sparse matrix 
is a matrix with zeros as the dominating elements. There is no precise definition for a sparse 
matrix. In other words, the “sparseness” is relatively defined. Figure 3.10 illustrates a matrix and 
a sparse matrix. 


24 6 8 20 0 0 
Il 20 2 0 0 0 0 
0 1 1 6 00 1 0 
20 1 4 0 2 0 0 
(a) Matrix (b) Sparse Matrix 


Fig. 3.10 Matrix and a sparse matrix 


The McGraw-Hill Companies 


Arrays 33 





A matrix consumes a lot of space in memory. Thus, a 1000 x 1000 matrix needs 1 million 
storage locations in memory. Imagine the situation when the matrix is sparse! To store a handful 
of non-zero elements, voluminous memory is allotted and thereby wasted! 

In such a case to save valuable storage space, we resort to a triple representation viz., (i, j, 
value) to represent each non-zero element of the sparse matrix. In other words, a sparse matrix 
A is represented by another matrix B[0: t, 1 : 3] with t + 1 rows and 3 columns. Here t refers to 
the number of non-zero elements in the sparse matrix. While rows 1 to t record the details 
pertaining to the non-zero elements as triple (that is 3 columns), the zeroth row viz. B[0, 1], 
B[O, 2] and B[O, 3] record the number of non-zero elements of the original sparse matrix A. 
Figure 3.11 illustrates a sparse matrix representation 


A{1:7, 1:6] B[0:5, 1:3] 
0 1 0 0 0 0 
0 0 0 0 0 0 =. 
2 0 0 1 0 0 o 
‘ 00000 5 hs 
3 4 1 
0 0 0 0 0 0 
6 2 -3 
0 -3 0 0 0 0 
7 6 1 
0 0 0 0 0 1 
Fig. 3.11 Sparse matrix representation 


A simple example of a sparse matrix arises in the arrangement of choice of say 5 elective 
courses from the specified list of 100 elective courses, by 20000 students of a university. The 
arrangement of choice would turn out to be a matrix with 20000 rows and 100 columns with just 
5 non-zero entries per row, indicative of the choice made. Such a matrix could definitely be 
classified as sparse! 


Ordered lists 


One of the simplest and useful data objects in computer science is an ordered list or linear list. 
An ordered list can be either empty or non empty. In the latter case, the elements of the list are 
known as atoms, chosen from a set D. The ordered lists provide a variety of operations such as 
retrieval, insertion, deletion, update etc. The most common way to represent an ordered list is by 
using a one-dimensional array. Such a representation is termed sequential mapping though better 
forms of representation have been presented in the literature. 


Example 3.7 The following are ordered lists 
(i) (sun, mon, tue, wed, thu, fri, sat) 
(ii) (A, Ao, Az Ay ... Ay) 
(iii) (Unix, CP/M, Windows, Linux) 
The ordered lists represented as one-dimensional arrays are given as follows: 





The McGraw-Hill Companies 


34 Data Structures and Algorithms 


WEEK [1 : 7] 





tH] B BI A [5] [69] [7] 


VARIABLE [1 : N] 





mo 2) B] [N] 


OS [1 : 4] 





[1] [2] [3] [4] 


We illustrate below some of the operations performed on ordered lists, with examples. 


Operation Original ordered list Resultant ordered list after 
the operation 

Insertion (a4, A>, Az, Ag) (41, A>, Ag, A7, Ag) 

(Insert aș) 

Deletion (44, A>, Az, Ag) (41, Ay, Az) 

(Delete ag) 

Update (44, A>, Az, Ag) (41, As, 47, Ag) 


(update a, to as) 


ADT for Arrays 


Data objects: 

A set of elements of the same type stored in a sequence 

Operations: 

Sore wells Wb am “coe 2°" elleiemic Oi eee erceyy AIRE? 
ARRAY[i] = VAL 

SS Steiers ice wedlte agi tls a4 a oi aie KR ee VINE 


VAL = ARRAY[i] d 
O Summary 


> Array as an ADT supports only two operations STORE and RETRIEVE. 

> Arrays may be one, two or multi dimensioned and stored in memory either in row major 
order or column major order, in consecutive memory locations 

> Since memory is considered one dimensional and arrays may be multi-dimensional it 
becomes essential to know the representations of arrays in memory, especially from the 








The McGraw-Hill Companies 


Arrays 35 


compiler’s point of view. The address calculation of array elements has been elaborately 
discussed. 

> Two concepts viz., sparse matrices and ordered lists, of use to computer science have been 
briefly described as applications of arrays. 


© Illustrative Problems 


Problem 3.1 The following details are available about an array RESULT. Find the address of 


RESULT[17]. 
Base address =. 620 
Index range 7 120 
Array type : Real 
Size of the memory location : 4 bytes 


Solution: Since RESULT[1:20] is a one-dimensioned array, the address for RESULT[17] is given 
by base address + (17 — lower index). However, the cell is made of 4 bytes, hence the address 


RESULT [1:20] 
520 524 


«4 bytes —> 
RESULT [1] RESULT [2] zr RESULT [20] 


is given by base address + (17 — lower index) - 4 = 520 + (17 — 1) - 4 = 584 
The array RESULT may be visualized as shown. 


Problem 3.2 For the following array B, compute 
(i) the dimension of B 


(ii) the space occupied by B in the memory 
(iii) the address of B[7, 2] 


Array : B Column index: 0:5 
Base address : 1003 Size of the memory location : 4 bytes 
Row index : 0:15 

Solution: 


(i) The number of elements in B is 16 x 6 = 96 
(ii) The space occupied by B is 96 x 4 = 384 bytes 
(iii) The address of B[7, 2] is given by 

1003 + (7 — 0) -6 + (2 — 0) = 1003 + 42 + 2 
= 1047 





The McGraw-Hill Companies 


36 Data Structures and Algorithms 


Problem 3.3 A programming language permits indexing of arrays with character subscripts; 
for example, CHR_ARRAY[‘A’:’D’]. In such a case the elements of the array are CHR_ARRAY[‘A’], 
CHR_ARRAY{‘B’] etc. and the ordinal number (ORD) of the characters viz., ORD(‘A’) = 1, ORD(‘B’) 
= 2, ORD(‘Z’) = 26 and so on are used to denote the index. 

Now two arrays TEMP[1 : 5, -1 : 2] and CODE[‘A’ :’Z’, 1 : 2] are stored in the memory beginning 
from address 500. Also CODE succeeds TEMP in storage. Calculate the addresses of (i) TEMP 
[5, -1] (ii) CODE[‘N’,2] and (iii) CODE[‘Z’,1]. 


Solution: From the details given, the representation of TEMP and CODE arrays in memory is 
as given below: 


| TEMP [1,—1] TEMP [1,0] TEMP [5, 2] | CODE [ ‘4°: 1) CODE [‘Z’, 2] | 

i 
i | 
haina EMP =m EN COE sree! >| 
| [1:5,—1:2] | [éd? 82? 12 3] | 


(i) The address of TEMP[5, —1] is given by 
base-address + (5 — 1)(2 — (-1) + 1) + (-1 - (-1)) 
= 500 + 16 
= 516 


(ii) To obtain the addresses of CODE elements it is necessary to obtain its base address which 
is the immediate location after TEMP[5, 2], the last element of array TEMP. 
Hence the address of TEMP[5, 2] is computed as 
500 + (5 — 1)(2 — (-1) + 1) + (2 - (-1)) 
= 500 + 16 + 3 
= 519 


Therefore the base address of CODE is given by 520. 
Now the address of CODE [‘N’, 2] is given by 
base address of CODE + (ORD(‘N’) - ORD (‘A’)) (2-1 + 1) + (2-1) 
= 520+ (14 - 1)2 +1 
= 547 
(iii) The address of CODE[‘Z’,1] is computed as 
Base-address of CODE + ((ORD(‘Z’) - ORD(‘A’))(2 -1 + 1)) + (1 - 1) 
Of CODE 
= 520 + (26 — 1) - (2) + 0 
= 570 
Note: The base address of CODE may also be computed as 
Base-address of TEMP + (number of elements in TEMP - 1) + 1 
= 500 + (5.4 -1) +1 
= 520 





The McGraw-Hill Companies 


Arrays 37 


Q) Review Questions 


1. Which among the following pairs of operations is supported by an array ADT? 


(i) store and retrieve 

(ii) insert and delete 
(iii) copy and delete 
(iv) append and copy 

(a) (i) (b) (ii) (c) (iii) (d) (iv) 


. The number of elements in an array ARRAY[Ļ : u,, l, : up] is given by 


(a) (uy -l - D- l- 1) (b) (uy. Uy) 
(c) (u, — 1,)(u, - l») (d) (vu, —1,+ 1), -= L + 1) 

. A multi-dimensional array OPEN[0 : 2, 10 : 20, 3 : 4, -10 : 2] contains _______ elements. 
(a) 240 (b) 858 (c) 390 (d) 160 

. For the array A[1:u,, 1: u,] where œ is the base address, A[i, 1] has its address given by 
(a) (i - 1)u, (b) a+(i-1)u, (c) & +i- u (d) œt (t=1) =i 


. For the array, A[1 : u,, 1: u, 1: u3] where @ is the base address, the address of Afi, j, 1] is 
given by 

(a) æ + (i - 1)uguz + (J - 1)ug (b) œ+ i: uuz +j: Uz 

(©) æ+ (i — Iya + (j — Iu (d) æ+ i- ugus + j > u 
. Distinguish between the row major and column major ordering of an array. 
. For an n-dimensional array A[1 : u4, 1: u, ... 1: uy] obtain the address of the element A[1,, 
ln, lz, ... iy] given p to be the home address. 
. For the following sparse matrix obtain an array representation. 


0 0 0-7 0 


0 5 0 0 0 
3 0 6 0 -1 
0 0 0 0 Ọ0 
5 0 0 0 Ọ 
0 0 0 0 OD 
9 0 0O 4 O 


(=) Programming Assignments 


1. Declare a one, two and a three-dimensional array in a programming language(such as C) 


which has the capability to display the addresses of array elements. Verify the various 
address calculation formulae that you have learnt in this chapter against the arrays that you 
have declared in the program. 


2. For the matrix A given below obtain a sparse matrix representation B. Write a program to 


(i) Obtain B given matrix A as input, and 
ii) Obtain the transpose of A using matrix B. 
pP 8 





The McGraw-Hill Companies 


38 Data Structures and Algorithms 

1 2 3 4 5 6 7 8 9 10 11 12 

110 0 0 0 0 0 0 0 0 0 0 0 

210 -1 0 0 02 0 0 0 0 0 0 

310 0 0 0 0 0 0 0 0 0 0 0 

4;0 0 0 0 0 0 0 0 0 0 0 0 

£10%12 514 0 0 -3 0 0 0 0 0 1 +0 O 
6/0 0 0 0 0 0 0 0 0 0 0 0 

71/0 0 0 0 0 0 0 0 0 0 0 0 

81-1 0 0 0 5 0 0 0 0 0 0 0 

9'0 0 0 0 00 2 0 0 4 0 0 

10;0 0 0 0 0 0 0 1 1 0 0 0 


3. Open an ordered list L[d,, d», ... d,,] where each d; is the name of a peripheral device, which 
is maintained in the alphabetical order. 
Write a program to 
(i) Insert a device d, onto the list L 
(ii) Delete an existing device d; from L. In this case the new ordered list should be L"” 
= (dy, dy, ... dj_4, Gj44, --. d,] with (n — 1) elements 
(iii) Find the length of L 
(iv) Update device d; to dı and print the new list. 


The McGraw-Hill Companies 


CHAPTER 


STACKS 





4.1 Introduction 

4.2 Stack Operations 
In this chapter we introduce the stack data structure, the operations 4.3 Applications 
supported by it and their implementation. Also, we illustrate two 
of its useful applications in computer science among the 


innumerable available. 


Introduction 4.1 


A stack is an ordered list with the restriction that elements are added or deleted from only one 
end of the list termed top of stack. The other end of the list which lies ‘inactive’ is termed bottom 
of stack. 

Thus if S is a stack with three elements a, b, c where c occupies the top of stack position, and 
if d were to be added, the resultant stack contents would be a, b, c, d. Note that d occupies the 
top of stack position. Again, initiating a delete or remove operation would automatically throw 
out the element occupying the top of stack, viz., d. Figure 4.1 illustrates this functionality of the 
stack data structure. 





Ordered list of Add element d Remove element 
elements as a stack to stack from stack 
a b c a b c d a b c 
Bottom Top of Bottom Top of Bottom Top of 
ofstack stack of stack stack of stack stack 


Fig. 4.1 Stack and its functionality 


It needs to be observed that during insertion of elements into the stack it is essential that their 
identities are specified, where as for removal no identity need be specified since by virtue of its 
functionality, the element which occupies the top of stack position is automatically removed. 

The stack data structure therefore obeys the principle of Last In First Out (LIFO). In other 
words, elements inserted or added into the stack join last and those that joined last are the first 
to be removed. 

Some common examples of a stack occur during the serving of slices of bread arranged as a 
pile on a platter or during the usage of an elevator (Fig. 4.2). It is obvious that when one adds 
a Slice to a pile or removes one for serving, it is the top of the pile that is affected. Similarly, in 


The McGraw Hill Companies 





40 Data Structures and Algorithms 


Pile of bread slices dik fat 
22%, 7 
WY | 
I 


Fig. 4.2 Common examples of a stack 


the case of an elevator, the last person to board the cabin has to be the first person to alight from 
it (at least to make room for the others to alight!) 


Stack Operations 4.2 


The two operations which stack data structure supports are 
(i) Insertion or addition of elements known as Push 
(ii) Deletion or removal of elements known as Pop 
Before we discuss the operations supported by stack in detail, it is essential to know how 
stacks are implemented. 


Stack implementation 


A common and a basic method of implementing stacks is to make use of another fundamental 
data structure viz., arrays. While arrays are sequential data structures the other alternative of 
employing linked data structures have been successfully attempted and applied. We discuss this 
elaborately in Chapter 7. In this chapter we confine our discussion to the implementation of 
stacks using arrays. 

Figure 4.3 illustrates an array based implementation of stacks. This is fairly convenient 
considering the fact that stacks are uni-dimensional ordered lists and so are arrays which despite 
their multi-dimensional structure are inherently associated with a one-dimensional consecutive 
set of memory locations. (Refer Chapter 3). 





Stack 
STACK [1 : 7] 

« lopof 

stack 

STACK [1] [2] [3] [4] [5] [6] [7] 
Top of stack 
; Bottom of 

<__Bottom of sail 

stack 

i 


Fig. 4.3 Array implementation of stacks 





The McGraw-Hill Companies 


Stacks 41 


Figure 4.3 shows a stack of four elements R, S, V, G represented by an array STACK[1:7]. In 
general, if a stack is represented as an array STACK[1 : n] then n elements and not one more can 
be stored in the stack. It therefore becomes essential to issue a signal or warning termed 
STACK _FULL when elements whose number is over and above n are attempted to be pushed into 
the stack. 

Again, during a pop operation, it is essential to ensure that one does not delete an empty stack! 
Hence the necessity for a signal or a warning termed STACK_EMPTY during the implementation 
of the pop operation. While implementation of stacks using arrays necessitates checking for 
STACK _FULL/STACK_EMPTY conditions during push/pop operations respectively, the 
implementation of stacks with linked data structures dispenses with these testing conditions. 


Implementation of push and pop operations 


Let STACK [1:n] be an array implementation of a stack and top be a variable recording the 
current top of stack position. top is initialized to 0. item is the element to be pushed into the 
stack. n is the maximum capacity of the stack. 


Algorithm 4.1: Implementation of push operation on a stack 


procedure PUSH(STACK, n, top, item) 
if (top = n) then OTACKE EPUL, 
else 
op- rton ter IG 
STACK [top] item; /* store item as top element 
OL STACK 
end PUSH 





In the case of pop operation, as said earlier, no element identity need be specified since by default 
the element occupying the top of stack position is deleted. However, in Algorithm 4.2, item is 
used as an output variable which stores a copy of the element removed. 


Algorithm 4.2: Implementation of pop operation on a stack 


procedure POP (STACK, top, item) 
a8 ECSU a tben To TA k EMERY: 


else { item = STACK[top]; 
TOP OO ma Bel; 


} 





It is evident from the algorithms that to perform a single push/pop operation the time complexity 
is O (1). 


Example 4.1 Consider a stack DEVICE[1:3] of peripheral devices. The insertion of the four 
items PEN, PLOTTER, JOY STICK and PRINTER into DEVICE and a deletion are illustrated in 
Table 4.1 


The McGraw-Hill Companies 





42 


Data Structures and Algorithms 


Table 4.1 Push/pop operations on stack DEVICE[1:3] 


Stack operation 
operation 


1. Push ‘PEN’ into DEVICE[1:3] 
DEVICE[1:3] 


2. Push ‘PLOTTER’ DEVICE[1:3] 


into DEVICE[1:3] 


[1] {21[3] 


| 


Top 


3. Push ‘JOY STICK’ DEVICE[1:3] 


into DEVICE[1:3] 


[1] [2] 


Top 


4. Push ‘PRINTER’ DEVICE[1:3] 
into DEVICE[1:3] 


[1] [2] 


5. Pop from 
DEVICE[1:3] 


DEVICE[1:3] 


[1] [2] 


Stack before 


fren|euorrerfenrcx| ONO 
PEN |PLOTTER STICK 


JOY ITEM) 


Stack after 
operation 


Algorithm 
invocation 


PUSH(DEVICE, 
3,0, PEN’) 


DEVICE[1:3] 


[2][3] 


[1] 


Top 


PUSH(DEVICE,3, DEVICE[1:3] 


1, PLOTTER’) 


[1] [2] [3] 


PUSH(DEVICE, 3, DEVICE[1:3] 


[1] [3] 


PUSH(DEVICE, 3, DEVICE[1:3] 


JOY 
PEN [PLOTTER STICK 


[1] [2] [3] 


| 


Top 


POP(DEVICE, 3, DEVICE[1:3] 


JOY 
PEN |PLOTTER STICK 


ee [2] [3] 


| 


Top 


Push ‘PEN’ 


Successful 


Push 
‘PLOTTER’ 
successful 


Push JOY 
STICK’ 
successful 


Push 
‘PRINTER’ 
failure! 
STACK- 
FULL 
condition 
invoked 


ITEM = 
‘JOY 
STICK’ Pop 
operation 
successful 





Note that in operation 5 which is a pop operation, the top pointer is merely decremented as a 
mark of deletion. No physical erasure of data is carried out. 


The McGraw-Hill Companies 


Stacks 43 


Applications 4.3 


Stacks have found innumerable applications in computer science and other allied areas. In this 
section we introduce two applications of stacks which are useful in computer science, viz., 
(i) Recursive programming, and (ii) Evaluation of expressions 


Recursive programming 


The concept of recursion and recursive programming had been introduced in Chapter 2. In this 
section we demonstrate through a sample recursive program how stacks are helpful in handling 
recursion. Consider the recursive pseudo-code for factorial computation shown in Fig. 4.4. 
Observe the recursive call in Step 3. It is essential that during the computation of n!, the 
procedure does not lead to an endless series of calls to itself! Hence the need for a base case 
0! = 1 which is in Step 1. The spate of calls made by procedure FACTORIAL( ) to itself based on 
the value of n, can be viewed as FACTORIAL( ) replicating itself as many times as it calls itself 
with varying values of n. Also, all these procedures await normal termination before the final 
output of n! is completed and displayed by the very first call made to FACTORIAL( ). 
A procedural call would have a normal termination only when either the base case is executed 
(Step 1) or the recursive case has successfully ended, (i.e.) Steps 2-5 have completed their 
execution. 

During the execution, to keep track of the calls made to itself and to record the status of the 
parameters at the time of the call, a stack data structure is used. Figure 4.5 illustrates the various 
snap shots of the stack during the execution of FACTORIAL (5). Note that the values of the three 
parameters of the procedure FACTORIAL( ) viz., n, x, y are kept track of in the stack data 
structure. 


procedure FACTORIAL (n) 

Step if (n = 0) then FACTORIAL 
Step else {x = n - 1; 

Step FACTORIAL = = y; 
end FACTORIAL 


1 
2 
Step 3: y = FACTORIAL (x); 
a 
5 


Step 





Fig. 4.4 Recursive procedure to compute n! 


When the procedure FACTORIAL (5) is initiated (Fig. 4.5(a)) and executed (Fig. 4.5(b)) x obtains 
the value 4 and the control flow moves to Step 3 in the procedure FACTORIAL (5). This initiates 
the next call to the procedure as FACTORIAL (4). Observe that the first call (FACTORIAL (5) ) has not 
yet finished its execution when the next call (FACTORIAL (4)) to the procedure has been issued. 
Therefore there is this need to preserve the values of the variables used viz., n, x, y, in the 
preceding calls. Hence the need for a stack data structure. 

Every new procedure call pushes the current values of the parameters involved into the stack, 
thereby preserving the values used by the earlier calls. Figures 4.5(c-d) illustrate the contents 
of the stack during the execution of FACTORIAL (4) and subsequent procedure calls. During the 
execution of FACTORIAL (0) (Fig. 4.5(e)) Step 1 of the procedure is satisfied and this terminates 
the procedure call yielding the value FACTORIAL = 1. Since the call for FACTORIAL (0) was 
initiated in Step 3 of the previous call (FACTORIAL (1) ), y acquires the value of FACTORIAL (0) (i.e.) 





The McGraw-Hill Companies 


44 Data Structures and Algorithms 


n 5 n 5 n 5 4 
J x 5 J 4 3 
J y| A I AA 


(a) Invocation of (b) During the execution of (c) Invoking FACTORIAL (3) 
FACTORIAL (5) FACTORIAL (5) during the execution 
A indicates call to of FACTORIAL (4) 
FACTORIAL (4) (Step 3) 
mis # 3 2 d nis 4 3 2 1 QO Ais + 3 A A 
x| 4 3 2 l 0 x| 4 3 2 l 0 x) 4 3 2 l 0 
alt fo fF rrr es, EZERRE 
(d) Stack contents after (e) Invocation of (f) FACTORIAL (0) has 
subsequent calls and FACTORIAL (0) normal termination. Obtains 
during the execution of the value of 0! = 1 and returns 
APACTORIAL (1). to its point of invocation. 
indication call to Note y of FACTORIAL (1) 
FACTORIAL (0) receiving the computed value 


n}5 4 3 2 nl 5 
y| AAPA 1 y| 24 


(g) FACTORIAL (1) termination (h) Stack contents, after all other 
computes 1! = 1 and returns calls except FACTORIAL (5) 
it to the point of invocation have been normally terminated 


in FACTORIAL (2) 
Note y of FACTORIAL (2) 
receiving the value 


Fig. 4.5 Snapshots of the stack data structure during the execution of the procedural call 


FACTORIAL(5) 
1 and the execution control moves to Step 4 to compute FACTORIAL = n * y (1.e.) FACTORIAL 
= 1 * 1 = 1. With this computation, FACTORIAL (1) terminates its execution. As said earlier, 


FACTORIAL (1) returns the computed value of 1 to Step 3 of the previous call FACTORIAL (2). Once 
again it yields the result FACTORIAL = n * y = 2 * 1 = 2. which terminates the procedure 
call to FACTORIAL (2) and returns the result to Step 3 of the previous call FACTORIAL (3) and so 
on. 





The McGraw Hill Companies 


Stacks 45 


Observe that the stack data structure grows due to a series of push operations during the 
procedure calls and unwinds itself by a series of pop operations until it reaches the step 
associated with the first procedure call, to complete its execution and display the result. 

During the execution of FACTORIAL (5), the first and the oldest call to be made, y in Step 3 
computes y = FACTORIAL(4) = 24 and proceeds to obtain FACTORIAL = n * y = 5 * 24 
= 120 which is the desired result. 


Tail recursion Tail recursion or Tail-end recursion is a special case of recursion where a 
recursive call to the function turns out to be the last action in the calling function. Note that the 
recursive call needs to be the last executed statement in the function and not necessarily the 
last statement in the function. 


Generally, in a stack implementation of a recursive call, all the local variables of the function 
that are to be “remembered”, are pushed into the stack when the call is made. Upon termination 
of the recursive call, the local variables are popped out and restored to their previous values. 
Now for tail recursion, since the recursive call turns out to be the last executed statement, there 
is no need that the local variables must be pushed into a stack for them to be “remembered” and 
“restored” on termination of the recursive call. This is because when the recursive call ends, the 
calling function itself terminates at which all local variables are automatically discarded. 

Tail recursion is considered important in many high level languages, especially functional 
programming languages. These languages rely on tail recursion to implement iteration. It is 
known that compared to iterations, recursions need more stack space and tail recursions are ideal 
candidates for transformation into iterations. 


Evaluation of expressions 


Infix, Prefix and Postfix Expressions The evaluation of expressions is an important feature 
of compiler design. When we write or understand an arithmetic expression for example, —(A + B) 
T C* D +E, we do so by following the scheme of <operand> <operator> <operand> (i.e.) an 
<operator> is preceded and succeeded by an <operand>. Such an expression is termed infix 
expression. It is already known how infix expressions used in programming languages have been 
accorded rules of hierarchy, precedence and associativity to ensure that the computer does not 
misinterpret the expression but computes its value in a unique way. 


In reality the compiler re-works on the infix expression to produce an equivalent expression 
which follows the scheme of <operand> <operand> <operator> and is known as postfix expression. 
For example, the infix expression a + b would have the equivalent postfix expression a b+. A third 
category of expression is the one which follows the scheme of <operator> <operand> <operand> 
and is known as prefix expression For example, the equivalent prefix expression corresponding 
toa+ bis +a b. Examples 4.2, 4.3 illustrate the hand computation of prefix and postfix expressions 
from a given infix expression. 


Example 4.2 Consider an infix expression a + b+c — d. The equivalent postfix expression can 
be hand computed by decomposing the original expression into sub expressions based on the 
usual rules of hierarchy, precedence and associativity. 





The McGraw-Hill Companies 


46 Data Structures and Algorithms 


Expression Sub expression chosen based Postfix expression 
on rules of hierarchy, 
precedence and associativity 

(i) at+b*c-d 


Qd 
Gii) — 


© 





E 
© 


Hence abc * + d — is the equivalent postfix expression of a + b * c — d. 


Example 4.3 Consider the infix expression (a * b — f + h) Î d. The equivalent prefix expression 
is hand computed as given below: 


Expression Sub expression chosen based Prefix expression 
on rules of hierarchy, 
precedence and associativity 





(i.e) 
T —*ab*fhd 
Hence the equivalent prefix expression of (a + b —f*h)T dis Î-— + ab * fhd. 





The McGraw-Hill Companies 


Stacks 47 


Evaluation of postfix expressions As discussed earlier, the compiler finds it convenient to 
evaluate an expression in its postfix form. The virtues of postfix form include elimination of 
parantheses which signify priority of evaluation and the elimination of the need to observe rules 
of hierarchy, precedence and associativity during evaluation of the expression. This implies that 
the evaluation of a postfix expression is done by merely undertaking a left to right scan of the 
expression, pushing operands into a stack and evaluating the operator with the appropriate 
number of operands popped out from the stack and finally placing the output of the evaluated 
expression into the stack. 


Algorithm 4.3 illustrates the evaluation of a postfix expression. Here the postfix expression is 
terminated with $ to signal end of input. 


Algorithm 4.3: Procedure to evaluate a postfix expression E 
Procedure EAE Os TEI (E) 
he) Ferenc S (2) 0, 
/* get the next (character of “expression E *7 
case x of 
x is an operand: PU mE e a cine lc roy 
:x 1s an operator: Pop out required number of operands 
from the stack S, evaluate the 
operator and push the result into 
the stacks; 
oe ene ons Poo Out the resulte irom stack 6; 


end case 
end EVAL-POSTFIX. | F 


The evaluation of a postfix expression using Algorithm EVAL_POSTFIX is illustrated in 
Example 4.4. 


Example 4.4 To evaluate the postfix expression of A + B * C Î D for A = 2, B = -1, 
C = 2 and D = 3, using Algorithm EVAL_POSTFIX. 
The equivalent postfix expression can be computed to be ABCD Î * +. 


The evaluation of the postfix expression using the algorithm is illustrated below: The values of 
the operands pushed into stack S are given within parentheses e.g. A(2), B(-1) etc. 


| A (2) Push A into S 


bB | A(2) BC) Push B into S 


[AD BE C(2) D@) Push D into S 





(Contd.) 


The McGraw-Hill Companies 





48 Data Structures and Algorithms 


(Contd.) 


T | A(2) B(-1) 8 Pop out two operands from stack 
S viz. C(2), D(3). Compute C T D 
and push the result CT D=2T3=8 
into stack S. 


Pop out B(-1) and 8 from stack S. 
Compute B * 8 = -1 * 8 =-8 and 
push the result into stack S. 


Pop out A(2), -8 from stack S. 
Compute A -8=2-8=-6 
and push the result into stack S 


Pop out —6 from stack S and output 
the same as the result. 


ADT for Stacks 


Data objects: 
A finite set of elements of the same type 
Operations: 
èe Create an empty stack and initialize top of stack 
CREATE ( STACK) 
Check if stack 1s empty 
CHE ROEA F TEME EACE O Can ruUncTrTLon) 
Check ur T taek ear 
CE OTAC TEOT EAC a Poo e n un e o 
Push ITEM int on Stack TAr 
PUSH ( SLACK. E) 
Pop element from stack STACK and output the element popped in ITEM 
POE (Ss LACK ViehtM) 


© Summary 


A stack data structure is an ordered list with insertions and deletions done at one end of 
the list known as top of stack. 
> An insert operation is called as a push operation and delete operation is called as pop 
operation. 
> A stack can be commonly implemented using the array data structure. However, in such 
a case it is essential to take note of stack full / stack empty conditions during the 
implementation of push and pop operations respectively. 
> Two applications of the stack data structure, viz., 
(i) Handling recursive programming, and 
(ii) Evaluation of postfix expressions 
have been detailed. 











The McGraw-Hill Companies 


Stacks 


© Illustrative Problems 


Problem 4.1 


X) pushes an element X into S, POP (S, 


49 


Following is a pseudo code of a series of operations on a stack S. PUSH (S, 
X) pops out an element from stack S as X, PRINT (X) 


displays the variable X and EMPTYSTACK (5) is a Boolean function which returns true if S is empty 


and false other wise. What is the output of the code? 
9; 
10. 
11. 
12. 
13. 
14. 
15, 
16. 


We track the contents of the stack S and the values of the variables X, Y, Z as below: 


l Xs = 30: 

2: <= = 15% 

a. Z: = Bos 

4. PUSH(S, X); 
5. PUSH(S, 40); 
6. POP(S, Zj; 
7. PUSH(S, Y); 
8. PUSH(S, 30); 

Solution: 


PUSH (US; 
POP (S, 
PUSH (S, 20); 

PUSH (S; X); 

while not EMPTYSTACK(S) do 
POP (S, X); 

PRINT (X); 

end 


Li) 3 
XxX); 





The execution of Steps 13-16 repeatedly pops out the elements from S displaying each element. 
The output therefore would be, 


with the stack S empty. 


40 20 30 15 


30 


50 





The McGraw ‘Hill Companies 


Data Structures and Algorithms 


Problem 4.2 Use procedure PUSH(S, X), POP(S, X), PRINT (X) and EMPTY STACK(S) (as 
described in Illustrative Problem 4.1) and TOP_OF STACK(S) which returns the top element of 
stack S to write pseudo code for 

(i) Assign X to the bottom element of the stack S leaving the stack empty. 

(ii) Assign X to the bottom element of the stack leaving the stack unchanged. 


(iii) 


Assign X to the nt” element in the stack (from the top) leaving the stack unchanged. 


Solution: 
(i) while not EMPTYSTACK(S) do 


(ii) 


POP (S, X) 
end 


PRINT (X); 

X holds the element at the bottom of the stack. 

Since the stack S has to be left unchanged we make use of another stack T to temporarily 
hold the contents of S. 


while not EMPTYSTACK (S) do 


POP (S, X) 
PUSH (T, X) 
end /* empty contents of S into T */ 
PRINT (X) ; fe ‘output X #7 
while not EMPTYSTACK(T) do 
POP(T, Y) 
PUSH(S, Y) 
end /* empty contents of T back into S */ 
(iii) We make use of a stack T to remember the top n elements of stack S before replacing it back 
into S. 
for i: = 1 to n do 


POP (S, X) 
PUSH (T, X) 
end /* Push top n elements of S into T */ 


PRINT (X); /* display X */ 

for i = 1 to n do 

POP A(T; Y); 

PUSH (S, Y); 

end /* Replace back the top n elements available in T into S */ 


Problem 4.3 What is the output produced by the following segment of code where for a 
stack S, PUSH (S, X), POP (S, X), PRINT (X), EMPTY _ STACK (S) are procedures as described 
in Illustrative Problem 4.1 and CLEAR (S) is a procedure which empties the contents of the stack S? 


1. 


2 
3. 
4 


TERM = 3; 6. else 

. CLEAR (STACK); 7. POP (STACK, TERM); 
repeat 8. PRINT (TERM); 

. if TERM <= 12 then 9. TERM = 3 * TERM + 2; 
PUSH (STACK, TERM); 10. until EMPTY STACK (STACK) 


TERM = 2 * TERM; and TERM > 15. 





The McGraw-Hill Companies 


Stacks 51 


Solution: Let us keep track of the stack contents and the variable TERM as shown below: 


stack STACK TERM Output displayed 





The output is 12, 6, 3, 11. 


Problem 4.4 For the following pseudo code of a recursive program mod which computes a 
mod b given a, b as inputs, trace the stack contents during the execution of the call mod (23, 7). 


procedure mod (a, b) 
if (a < b) then mod 
else 

= D 


(Xir 








The McGraw-Hill Companies 


52 Data Structures and Algorithms 


Solution: We open a stack structure to track the variables a, b, x,, y4 as shown below. The snap 
shots of the stack during recursion are shown. 


16| 7| 9| T 
23| 7/16! T 23| 7/16] T 


a b x, 1 ab X Yi a bX, W 
(a) call mod(23,7) (b) call mod(16,7) (c) call mod(9,7) (d) call mod(2,7) 


output: 2 
231 7116] 2 
a bx, H a 1 


(e) After termination (f) After termination of (g) After termination 
of mod (2,7) mod(9,7) and mod(16,7) of mod (23,7) 








Problem 4.5 For the infix expression given below, obtain (i) the equivalent postfix expression, 
(ii) the equivalent prefix expression, and (iii) evaluate the equivalent postfix expression obtained 
in (i) using the algorithm EVAL POSTFIX( ) (Algorithm 4.3), with A = 1, B = 10, C=1, D=2, 
G = -1 and H = 6. 


Solution: (ìi), (ii): We demonstrate the steps to compute the prefix expression and postfix expression 
in parallel in the following table: 


Expression Sub-expression chosen Equivalent Equivalent 
based on rules of Postfix Prefix 
hierarchy, precedence expression expression 
and associativity 


(-(A+B+C)TD)*(G+H) | (A+ B+C) dd): AB + C+ (1): ++ ABC 


© 
[Note: (A + B + C) 
is equivalent to the two 


subexpressions 


(A+B+C) 





(Contd.) 





The McGraw-Hill Companies 
Stacks 53 


(Contd.) 


(-@TD)*(G +H) ©: AB+C+— |@: ABC 
© 

(@ T D) * (G+H) (2 Î D) ©:4s+c o 

© 


© 


ORO GD ®©:AB+C+ *? 44 ABCD 
© 
-D Î GH +* + GH 
The equivalent postfix and prefix expressions are AB + C + -D Î GH +* and *T —++ ABCD 
+ GH respectively. 


(iii) To evaluate AB + C + -D T GH + * $ for A=1, B = 10, C=1, D=2, G=-1 and H = 6, using 
Algorithm EVAL POSTFIX( ), the steps are listed in the following table: 


[aa «|: Push A into s 

| A(1) B10) Push B into S 

SS Evaluate A+B and push result into S 

| 11 C(1) Push C into S 

| 12 Evaluate 11 + C and push result into S 


-12 Evaluate (unary minus) —12 and push 
result into S 


Eroa Push D into S 





144 Evaluate (-12) TD and push result into S 


| 144 G(-1) Push G into S 
| 144 G(-1) H (6) Push H into S 





#: A compiler basically distinguishes between a unary “ — “ and a binary “_” by generating different tokens. 
Hence there is no ambiguity regarding the number of operands to be popped out from the stack when the 
operator is “_”. In the case of a unary a single operand is popped out and in the case of binary “_” 
two operands are popped out from the stack. 


H 1 
— 7 


54 





The McGraw-Hill Companies 


Data Structures and Algorithms 


( Contd.) 


Evaluate G+H and push result into S 


Evaluate 144 * 5 and push result into S 





Output 720 


(Q) Review Questions 


L 


10. 


Which among the following properties does not hold good in a stack? 
(i) A stack supports the principle of Last In First Out 
(ii) A push operation decrements the top pointer 
(iii) A pop operation deletes an item from the stack 
(iv) A linear stack has limited capacity 
(a) (i) (b) (ii) (o) (iii) (d) (iv) 
A linear stack S is implemented using an array as shown below. The TOP pointer which 
points to the top most element of the stack is set as shown. 


[1] [2] [3] [4] [5] 
Bottom of T stack T TOP 

Execution of the operation PUSH( S, ‘W’) would result in 

(a) TOP =4 (b) TOP =5 

(c) Stack full condition (d) TOP =3 
For the linear stack shown in Review Question 2, execution of the operations POP (S), 
POP(S), PUSH(S, ʻU’), POP(S) in a sequential fashion would leave the element 
on top of the stack with the TOP pointer set to the value _____. 

(a) Y,2 (b) U, 3 (cœ) U, 1 (d) U, 4 
The equivalent post fix expression for the infix expression a + b + c is 

(a) abct++ (b) abt+c+ (c) abt++c (d) a++bc 
The equivalent post fix expression for the infix expression aTb Î c T d is 

(a) abTcdTT (b) abcTTTd (c) abTcTdT (d) abcd TTT 
How are insert operations carried out in a stack? 
What are the demerits of a linear stack? 
If a stack S[1 : n] were to be implemented with the bottom of the stack at S[n], write a 
procedure to undertake push operation on S. 
For the stack S[1 : n] introduced in Review Question 8 of Chapter 4, write a procedure to 
undertake the pop operation on S. 
For the following logical expression 
(a and b and c) or d or e or (not h) 
(i) obtain the equivalent postfix expression 


(ii) evaluate the post fix expression for a = true, b = false, c = true, d = true, e = true, 
h = false. 





The McGraw Hill Companies 


Stacks 


55 


(=) Programming Assignments 


1. 
(i) 
(ii) 


Implement a stack S of n elements using arrays. Write functions to perform PUSH and POP 
operations. Implement queries using the push and pop functions to 

Retrieve the mt element of the stack S from the top (m < n), leaving the stack without its 
top m — 1 elements 

Retain only the elements in the odd position of the stack and pop out all even positioned 
elements. 


(e.g.) Stack S Output stack S 
Elements: [@ [> [e [7 
Position: 1 2 3 4 l 2 


Write a recursive program to obtain the n order Fibonacci sequence number. Include 
appropriate input / output statements to track the variables participating in recursion. Do 
you observe the ‘invisible’ stack at work? Record your observations. 

Implement a program to evaluate any given postfix expression. Test your program for the 
evaluation of the equivalent postfix form of the expression ((A*B)/D) T C+E-F*H* 1 
for A=1,B=2, D = 3, C = 14, E = 110, F = 220, H = 16.78, I = 364.621. 


The McGraw-Hill Companies 


CHAPTER 





5.1 Introduction 


5.2 Operations on 
In this chapter, we discuss the queue data structure, its operations Queues 
and its variants viz, circular queues, priority queues and deques. 5.3 Circular Queues 
The application of the data structure is demonstrated on the 
Bios ; 5.4 Other types of 
problem of job scheduling in a time sharing system environment. Oee 


5.5 Applications 


Introduction 5.1 





A Queue is a linear list in which all insertions are made at one end of the list known as rear or 
tail of the queue and all deletions are made at the other end known as front or head of the queue. 
An insertion operation is also referred to as enqueuing a queue and a deletion operation is referred 
to as dequeuing a queue. 

Figure 5.1 illustrates a queue and its functionality. Here, Q is a queue of three elements a, b, c 
(Fig. 5.1(a)). When an element d is to join the queue, it is inserted at the rear end of the queue 
(Fig. 5.1(b)) and when an element is to be deleted, the one at the front end of the queue, viz, a, 
is deleted automatically (Fig. 5.1(c)). Thus a queue data structure obeys the principle of first in 
first out (FIFO) or first come first served (FCFS). 


abe ao¢da bed 


A 


front rear front rear front rear 


(a) A Queue Q (b) Insert ‘d’ into Q (c) Delete from Q 
Fig. 5.1 A queue and its functionality 


Many examples of queues occur in everyday life. Figure 5.2(a) illustrates a queue of clients 
awaiting to be served by a clerk in a booking counter and Fig. 5.2(b) illustrates a trail of 
components moving down an assembly line to be processed by a robot at the end of the line. The 
FIFO principle of insertion at the rear end of the queue when a new client arrives or when a new 
component is added, and deletion at the front end of the queue when the service of the client or 
processing of the component is complete is evident. 


The McGraw-Hill Companies 





Queues 57 


Operations on Queues 





The queue data structure supports two operations, viz., 
(i) Insertion or addition of elements to a queue 
(ii) Deletion or removal of elements from a queue 
Before we proceed to discuss these operations, it is essential to know how queues are 
implemented. 


> A o 
ANV TTT 


front rear 
front rear 


(a) Queue before a booking counter (b) Queue of components in an assembly line 


Fig. 5.2 Common examples of queues 


Queue Implementation 


As discussed for stacks, a common method of implementing a queue data structure is to use 
another sequential data structure, viz, arrays. However, queues have also been implemented 
using a linked data structure (Refer Chapter 7). In this chapter, we confine our discussion to the 
implementation of queues using arrays. 

Figure 5.3 illustrates an array based implementation of a queue. A queue Q of four elements 
R,S,V,G is represented using an array Q [1:7]. Note how the variables FRONT and REAR keep track 
of the front and rear ends of the queue to facilitate execution of insertion and deletion operations 
respectively. 


Array representation of Q 


nets aria FRONT: 

-e - l 
are Gema H 
To f 0) 2 BI 4 [5] [6] [7] 


Fig. 5.3 Array implementation of a queue 


However, just as in the stack data structure, the array implementation puts a limitation on the 
capacity of the queue. In other words, the number of elements in the queue cannot exceed the 
maximum dimension of the one dimensional array. Thus a queue that is accommodated in an 
array Q[1 : n], cannot hold more than n elements. Hence every insertion of an element into the 
queue has to necessarily test for a QUEUE-FULL condition before executing the insertion 





The McGraw-Hill Companies 


58 Data Structures and Algorithms 


operation. Again, each deletion has to ensure that it is not attempted on a queue which is already 
empty calling for the need to test for a QUEUE-EMPTY condition before executing the deletion 
operation. But as said earlier with regard to stacks, the linked representation of queues dispenses 
with the need for these QUEUE-FULL and QUEUE-EMPTY testing conditions and hence prove 
to be elegant and efficient. 


Implementation of insert and delete operations on a queue 


Let Q[1 : n] be an array implementation of a queue. Let FRONT and REAR be variables recording 
the front and rear positions of the queue. The FRONT variable points to a position which is 
physically one less than the actual front of the queue. ITEM is the element to be inserted 
into the queue. n is the maximum capacity of the queue. Both FRONT and REAR are initialized to 
0. 

Algorithm 5.1 illustrates the insert operation on a queue. 


Algorithm 5.1: Implementation of an insert operation on a queue 


procedure INSERTQ (Q, n, ITEM, REAR) 

/ Daseme @ en ILE M I nao When Capactay | las" 
if (REAR = n) then QUEUE FULL; 
REAR = REAR + 1; /* Increment REAR*/ 
O[REAR] = ITEM; /* Insert ITEM as the rear element*/ 
end INSERTO 





It can be observed in Algorithm 5.1 that addition of every new element into the queue increments 
the REAR variable. However, before insertion, the condition whether the queue is full 
(QUEUE FULL) is checked. This ensures that there is no overflow of elements in a queue. 

The delete operation is illustrated in Algorithm 5.2. Though a deletion operation automatically 
deletes the front element of the queue, the variable ITEM is used as an output variable to store and 
perhaps display the value of the element removed. 


Algorithm 5.2: Implementation of a delete operation on a queue 


procedure DELETEQ (Q, FRONT, REAR, ITEM ) 
if (FRONT = REAR) then QUEUE EMPTY; 


PRONG A= FRONT = + 1 
JS =O) | IMROMME || 7 
end DELETEQ. 





In Algorithm 5.2, observe that to perform a delete operation, the participation of both the 
variables FRONT and REAR is essential. Before deletion, the condition (FRONT = REAR) checks for the 
emptiness of the queue. If the queue is not empty, FRONT is incremented by 1 to point to the 
element to be deleted and subsequently the element is removed through ITEM. Note how this 
leaves the FRONT variable remembering the position which is one less than the actual front of the 
queue. This helps in the usage of (FRONT = REAR) aS a common condition for testing whether a 
queue is empty, which occurs either after its initialization or after a sequence of insert and delete 
operations, when the queue has just emptied itself. 





The McGraw-Hill Companies 


Queues 59 


Soon after the queue Q has been initialized, FRONT = REAR = 0. Hence the condition (FRONT = 
REAR) ensures that the queue is empty. Again after a sequence of operations when Q has become 
partially or completely full and delete operations are repeatedly invoked to empty the queue, it 
may be observed how FRONT increments itself in steps of one with every deletion and begins 
moving towards REAR. During the final deletion which renders the queue empty, FRONT coincides 
with REAR satisfying the condition (FRONT = REAR = k), k # 0. Here k is the position of the last 
element to be deleted. 

Hence, we observe that in an array implementation of queues, with every insertion, REAR 
moves away from FRONT and with every deletion FRONT moves towards REAR. When the queue 
is empty, FRONT = REAR is satisfied and when full, REAR = n (the maximum capacity of the queue) 
is satisfied. 

Queues whose insert/delete operations follow the procedures implemented in Algorithms 5.1 
and 5.2, are known as linear queues to distinguish them from circular queues which will be 
discussed in Sec. 5.3. Example 5.1 demonstrates the working of a linear queue. The time 
complexity to perform a single insert/delete operation in a linear queue is O(1). 


Example 5.1 Let BIRDS [1:3] be a linear queue data structure. The working of Algorithms 


5.1 and 5.2 demonstrated on the insertions and deletions performed on BIRDS is illustrated in 
Table 5.1. 


Table 5.1 /nsert/delete operations on the queue BIRDS [1:3] 


Queue before operation | Algorithm| Queue after operation 


1. Insert BIRDS [1:3] INSERTQ BIRDS [1:3] acer 


‘DOVE’ into (BIRDS 3, 4 y 
DOVE 


H] BPI DI O BRI 


FRONT: o] REAR: o] FRONT: o] REAR;| 1 lä] 


E BIRDS [1:3] INSERTQ BIRDS [1:3] cee 


‘PEACOCK’ (BIRDS, 3, ‘PEACOCK’ 
into BIRDS povi | PEACOCK, successful 
[1:3] 1) 


1] 2) a 


FRONT: o] REAR: | 1 | 





(Contd.) 





The McGraw-Hill Companies 


60 Data Structures and Algorithms 


(Contd.) 


BIRDS [1:3 BIRDS [1:3] 
3. Insert [1:3] INGERTO 


‘PIGEON’ in Insert 
PEA- (BIRDS, 3 PEA- | PIG- 
DOVE pa DOVE , ; 
a pove COCK || ‘PIGEON’, 2) COCK] EON aie 
1} [2 


successful 
[1] 


] D] [2] [3] 
FRONT: [o] REAR: FRONT: 0 | REAR: 


BIRDS [1:3] INSERTQ BIRDS [1:3] 
4. Insert 


Insert 
, BIRDS, 3 
‘SWAN’ in PEA- | PIG- ( | ge PEA- | PIG- 
DOVE "SWAN, 3 DOVE ‘SWAN’ 
to BIRDS pov es on povyE iF failure! 
1:3 l 
[Hal i [2] [3] 1) [2] BI 


QUEUE_FULL 
FRONT: | 0 | REAR} 3 | | 3 | FRONT: [o] REAR:| 3 | 3 | condition 


invoked. 


Sadie BIRDS [1:3] DELETEQ BIRDS [1:3] Sele 
(BIRDS, 0, 3, elete 


payg PEA | PIG- ITEM) PEA- | PIG- successful. 
COCK EON COCK] EON ITEM 
4) BR] [S] [1] [2] [8] =DOVE 


FRONT] 0 | REAR: FRONT: REAR: 


6. Delete BIRDS [1:3] DELETEQ BIRDS [1:3] Delete 


(BIRDS, 0, successful. 
BERE |EN 3, ITEM) PIG- ITEM 
COCK EON EON -PEACOCK 
A B [3] 


FRONT: REAR:Í 3 | FRONT: B 2 | REAR:| 3 | B 


7. Insert BIRDS [1:3] INSERTQ( BIRDS [1:3] Insert 
‘SWAN’ in BIRDS, 3 


3 ‘SWAN’ 

Ci EES (ġġ (TT Tes |e 

[1:3] EON EON QUEUE_ 
FULL 


condition 
FRONT:| 2 [e REAR:| 3 | B FRONT:| 2 oO REAR: | 3 | 3 | invoked. 





The McGraw-Hill Companies 





Queues 61 


(Contd.) 


8. Delete BIRDS [1:3] DELETEQ BIRDS [1:3] Delete 
(BIRDS, 2, 3, 


successful. 

PIG- ITEM) ITEM= 

EON PIGEON 
O RI [3] [1] [3] 


[2] 
FRONT: REAR} 3 | FRONT: REAR,| 3 | 


(BIRDS, 3, 3, EMPTY 
ITEM) condition 
invoked. 


[1] [2] [3] [0] [2] [3] 


| 
FRONT: REAR: FRONT: REAR:| 3 | 





invocation 


Limitations of linear queues 


Example 5.1 illustrated the implementation of insert and delete operations on a linear queue. In 
operation 4 when ‘SWAN’ was inserted into BIRDS [1:3], the insertion operation was unsuccessful 
since the QUEUE FULL condition was invoked. Also, one observes the queue BIRDS to be 
physically full justifying the condition. But after operations 5 and 6 were performed and when 
two elements viz., DOVE and PEACOCK were deleted, despite the space it had created to 
accommodate two more insertions, the insertion of ‘SWAN’ attempted in operation 7 was rejected 
once again due to the invocation of the QUEUE _FULL condition. This is a gross limitation of a linear 
queue since QUEUE FULL condition does not check whether Q is ‘physically’ full. It merely relies 
on the condition (REAR = n) which may turn out to be true even for a queue that is only partially 
full as shown in operation 7 of Example 5.1. 

When one contrasts this implementation with the working of a queue that one sees around in 
every day life, it is easy to see that with every deletion (after completion of service at one end 
of the queue) the remaining elements move forward towards the head of the queue leaving no 
gaps in-between. This obviously makes room for that many insertions to be accommodated at the 
tail end of the queue depending on the space available. 

However, to attempt implementing this strategy during every deletion of an element is 
worthless since data movement is always computationally expensive and may render the process 
of queue maintenance highly inefficient. 

In short, when a QUEUE FULL condition is invoked it does not necessarily imply that the queue 
is ‘physically’ full. This leads to the limitation of rejecting insertions despite the space available 
to accommodate them. The rectification of this limitation leads to what are known as circular 
queues. 


The McGraw-Hill Companies 


62 Data Structures and Algorithms 


Circular Queues 5.3 





In this section we discuss the implementation and operations on circular queues which serve to 
rectify the limitation of linear queues. 

As the name indicates a circular queue is not linear in structure but instead it is circular. In 
other words, the FRONT and REAR variables which displayed a linear (left to right) movement over 
a queue, display a circular movement (clock wise) over the queue data structure. 


Operations on a circular queue 


Let CIRC_Q be a circular queue with a capacity of three elements as shown in Fig. 5.4(a). The 
queue is obviously full with FRONT pointing to the element at the head of the queue and REAR 
pointing to the element at the tail end of the queue. Let us now perform two deletions and then 
attempt insertions of ‘d’ and ʻe” into the queue. 


CIRC Q: Insert ‘d’ A 


LL REAR FRONT 


FRONT REAR FRONT o> diese 
REAR Insert ‘e 
REAR FRONT 
(a) Initial (b) Circular queue (c) Circular queue 
circular queue after two deletions after insertion of d, e 


Fig. 5.4 Working of a circular queue 


Observe the circular movement of the FRONT and REAR variables. After two deletions, FRONT 
moves towards REAR and points to ‘c’ as the current front element of CIRC_QO (Fig. 5.4(b)). When 
‘d’ is inserted, unlike linear queues, REAR curls back in a clock wise fashion to accommodate ‘d’ 
in the vacant space available. A similar procedure follows for the insertion of ‘e’ as well 
(Fig. 5.4(c)). 

Figure 5.5 emphasizes this circular movement of FRONT and REAR variables over a general 
circular queue during a sequence of insertions/deletions. 

A circular queue when implemented using arrays is not different from linear queues in their 
physical storage. In other words, a linear queue is conceptually viewed to have a circular form 
to understand the clockwise movement of FRONT and REAR variables as shown in Fig. 5.6. 


Implementation of insertion and deletion operations in a circular queue 


Algorithms 5.3 and 5.4 illustrate the implementation of insert and delete operations in a circular 
queue respectively. The circular movement of FRONT and REAR variables is implemented using the 
mod function which is cyclical in nature. Also the array data structure CIRC_Q to implement the 
queue is declared to be CIRC_Q [0: n — 1] to facilitate the circular operation of FRONT and REAR 
variables. As in linear queues, FRONT points to a position which is one less then the actual front 
of the circular queue. Both FRONT and REAR are initialized to 0. Note that (n — 1) is the actual 
physical capacity of the queue in spite of the array declaration as [0 : n — 1] 





The McGraw-Hill Companies 


Queues 63 


FRONT REAR 





=< n(capacity) ———~> 
(a) A circular queue at some instance 


REAR FRONT 


So eC 
4 





(b) After insertion of a+, ak+2 ... apy (A + 1 <n) 


-m m 


- ~ 
- a 


FRONT REAR 





(c) After s deletions ( s > k) 


Fig. 5.5 Circular movement of FRONT and REAR variables in a circular queue 


REAR 
FRONT g < 
Ya 


FRONT REAR 
(a) Physical view (b) Conceptual view 


Fig. 5.6 Physical and conceptual view of a circular queue 


Algorithm 5.3: Implementation of insert operation on a circular queue 





procedure INSERT CIRCQ(CIRC_Q, FRONT,REAR, n, ITEM) 


REAR=(REAR + 1) mod wn; 

If (FRONT = REAR) then CIRCQ FULL; /* Here CIE C OERUCLL ET o hore Viele 
gueue fúll condition and 1f S0, 
retracts REAR to Its 
previous value*/ 


CTE RORE EAR ISG 
end INSERT CIRCQ. y 








The McGraw-Hill Companies 


64 Data Structures and Algorithms 


Algorithm 5.4: Implementation of a delete operation on a circular queue 
procedure DELETE CIRCQ(CIRC_Q, FRONT, REAR, n, ITEM) 


If (FRONT = REAR) then CIRCQ EMPTY; /* CIRC Q is physically empty*/ 
FRONT = (FRONT+1) mod n; 

ITEM = CIRC Q [FRONT]; 

end DELETE CIRCQ y 


The time complexities of Algorithms 5.3 and 5.4 is C{1). The working of the algorithms is 
demonstrated on an illustration given in Example 5.2. 


Example 5.2 Let COLOURS [0:3] be a circular queue data structure. Note the actual physical 
capacity of the queue is only 3 elements despite the declaration of the array as [0:3]. The operations 
illustrated below (Table 5.2) demonstrate the working of Algorithms 5.3 and 5.4. 


Table 5.2 /nsert and delete operations on the circular queue COLOURS [0:3] 


Circular Circular queue before Algorithm | Circular queue after Remarks 
Queue operation Invocation | operation 

operation 

1. Insert COLOURS [0: 3] INSERT COLOURS [0.3] Insert 
‘ORANGE’ _CIRCO ‘ORANGE’ 
into (COLOURS, o OA successful 
COLOURS 0, 0, 4, ANGE 


[0:3] ‘ORANGE’ 


COLOURS [0:3] 


2. Insert INSERT Insert 
‘BLUE’ into Bae “CIRCO ‘BLUE 
COLOURS NGE (COLOURS, successful 
[0:3] 0, 1, 4, 

‘BLUE’) 


COLOURS [0:3] 


3. Insert INSERT Insert 
‘WHITE’ ae BLUE aT _CIRCQ ‘WHITE’ 
into NGE (COLOURS, successful 
COLOURS 0, 2, 4, 


[0:3] ‘WHITE’ 


FRONT: [0] REAR: FRONT: [0] REAR: 





(Contd.) 





The McGraw-Hill Companies 


Queues 


(Contd.) 


4. Insert 
‘RED’ 

into 
COLOURS 
[0:3] 


5, 6. Delete 
twice from 
COLOURS 
[0:3] 


7. Insert 
‘YELLOW’ 
into 
COLOURS 
[0:3] 


8. Insert 
“VIOLET 
into 
COLOURS 
[0:3] 


COLOURS [0:3] 


FRONT: O REAR: 


COLOURS [0:3] 


ORA 


[2] 


FRONT: REAR: | o] 


INSERT 
_CIRCO 


(COLOURS, 


0, 3, 4, 
‘RED’) 


DELETE_ 
CIRCQ 


(COLOURS, 


0, 3, 4, 
ITEM’) 
DELETE_ 
CIRCQ 


(COLOURS, 


1, 3,4, 
ITEMS) 


INSERT 
_CIRCO 


(COLOURS, 


2,64 
‘YELLOW’) 


INSERT 
_CIRCO 


(COLOURS, 


2, 0, 4, 
‘VIOLET’ 


COLOURS [0:3] 


ORA 


FRONT: [o] REAR: 


FRONT: REAR: 


65 


CIRCO | 
FULL 
condition 
is invoked. 
Insert 
‘RED’ 
failure! 
Note: 
REAR 
retracts 

to its 
previous 
value of 3. 


DELETE 
operation 
successful 
ITEM = 
ORANGE 
ITEM = 
BLUE 


Insert 
‘YELLOW’ 
successful 


Insert 
‘VIOLET’ 
successful 





The McGraw-Hill Companies 


66 Data Structures and Algorithms 





Other Types of Queues 5.4 


Priority queues 


A priority queue is a queue in which insertion or deletion of items from any position in the queue 
are done based on some property (such as priority of task) 

For example, let P be a priority queue with three elements a, b, c whose priority factors are 
2, 1, 1 respectively. Here, larger the number, higher is the priority accorded to that element 
(Fig. 5.7 (a)). When a new element d with higher priority viz., 4 is inserted, d joins at the head 
of the queue superceding the remaining elements (Fig. 5.7(b)). When elements in the queue have 
the same priority, then the priority queue behaves as an ordinary queue following the principle 
of FIFO amongst such elements. 

The working of a priority queue may be likened to a situation when a file of patients wait for 
their turn in a queue to have an appointment with a doctor. All patients are accorded equal 
priority and follow an FCFS scheme by appointments. However, when a patient with bleeding 
injuries is brought in, he/ she is accorded high priority and is immediately moved to the head of 
the queue for immediate attention by the doctor. This is priority queue at work. 

A common method of implementation of a priority queue is to open as many queues as there 
are priority factors. A low priority queue will be operated for deletion only when all its high 
priority predecessors are empty. In other words, deletion of an element in a priority queue q, with 
priority p; is possible only when those queues q; with priorities p; ( p; > p; ) are empty. However, 
with regard to insertions, an element e, with priority p, joins the respective queue obeying the 
scheme of FIFO with regard to the queue q; alone. 


FRONT REAR FRONT REAR FRONT REAR 
a bD q a WD Wd a) Bd) ed) 
(a) Initial priority queue (b) Insert d@) (c) Delete 


x”): x is the element with priority y. 
Fig. 5.7 A priority queue 


Another method of implementation could be to sortout the elements in the queue according 
to the descending order of priorities every time an insertion takes place. The top priority element 
at the head of the queue is the one to be deleted. 

The choice of implementation depends on a time-space trade off based decision made by the 
user. While the first method of implementation of a priority queue using a cluster of queue 
consumes space, the time complexity of an insertion is only O(1). In the case of deletion of an 
element in a specific queue with a specific priority, it calls for the checking of all other queues 
preceding it in priority, to be empty. 

On the other hand, the second method consumes less space since it handles just a single queue. 
However, insertion of every element calls for sorting all the queue elements in the descending 
order, the most efficient of which reports a time complexity of O(n.logn). With regard to deletion, 
the element at the head of the queue is automatically deleted with a time complexity of O(1). 

The two methods of implementation of a priority queue are illustrated in Example 5.3. 





The McGraw-Hill Companies 


Queues 67 


Example 5.3 Let JOB be a queue of jobs to be undertaken at a factory shop floor for service 
by a machine. Let high (2), medium (1) and low (0) be the priorities accorded to jobs. Let J; (k) 
indicate a job J; to be undertaken with priority k. The implementations of a priority queue to keep 
track of the jobs, using the two methods of implementation discussed above, are illustrated for 
a sample set of job arrivals (insertions) and job service completion (deletion). 

Opening JOB queue: J,Q) h (1) Ja (0) 
Operations on the JOB queue in the chronological order : 
1. J, (2) arrives 
2. J; (2) arrives 
3. Execute job 
4. Execute job 
5. Execute job 


Implementation of a priority Implementation of a priority Remarks 
queue as a cluster of queues queue by sorting queue elements 


Ea High priority (2) Initial 
Ea eee Queue configuration 


Opening 
~ Medium priority (1) _ priority (1) JOB queue 


Eome fA) 50) 


Sa E O (1) ; n 
service 


la pron (0) 
JOB Queue 


J30) 


Ta 
High priority (2) 1. J4(2) arrives 


JOB Queue 


Ja(2) 
a 
Medium priority (1) JAN I IES o Insert J4(2) 
JOB Queue 4 i 
s—— 10) Jot) | 
A A 
Low priority (0) 
JOB Queue 


J30) 
AA 





(Contd.) 





The McGraw-Hill Companies 


68 Data Structures and Algorithms 


(Contd.) 


2. J,(2) arrives 


High priority (2) 2. J,(2) arrives 
JOB Queue 


J4(2) BO 


O Medium priority (1) AD E E A is) R 
O mMm -~~~ JOB Queue t A 
service 


Low priority (0) 
JOB Queue 


J3(0) 


ti: 


3. Execute Job 


High priority (2) 3. Execute Job 
JOB Queue 


J5(2) 


is deleted 
O Medium priority (1) BAEO AD TTE J4(2) is delete 
O MM —— JOB Queue ji A 


Machine li (1) b (1) 
service ` 


Low priority (0) 
JOB Queue 


J3(0) 


jke 





(Contd.) 





The McGraw-Hill Companies 


Queues 


(Contd.) 


4. Execute Job 


High priority (2) 


/— JOB Queue 
o Medium priority (1) 
O m 


JOB Queue 
Machine TQ) fl) ) J,(1) 
service 
jì A A 
Low priority (0) 
JOB Queue 


BO 


ti 


5. Execute Job 


High priority (2) 


la JOB Queue 
O Medium priority (1) 
O |) _ 


JOB Queue 


Machine WM : 
service 
Low priority (0) 
JOB Queue 


BO 


ti 


A Rear 


4. Execute Job 


Jy) Jo) 


| 


5. Execute Job 


J1) J30) 


Er 


69 


J3(0) 
A 








The McGraw Hill Companies 


70 Data Structures and Algorithms 


A variant of the implementation of a priority queue using multiple queues is to make use of 
a single two dimensional array to represent the list of queues and their contents. The number of 
rows in the array is equal to the number of priorities accorded to the data elements and the 
columns are equal to the maximum number of elements that can be accommodated in the queues 
corresponding to the priority number. Thus, if PRIO_QUE[1:m, 1:n] is an array representing a 
priority queue, then the data items joining the queue may have priority numbers ranging from 
1 to m and corresponding to each queue representing a priority, a maximum of n elements can 
be accommodated. Illustrative problem 5.4 demonstrates the implementation of a priority queue 
as a two dimensional array. 


Deques 


A deque (double ended queue) is a linear list in which all insertions and deletions are made at 
the end of the list. A dequeue is pronounced as ‘deck’ or ‘de queue’. 

A deque is therefore more general than a stack or queue and is a sort of FLIFLO (First in Last 
In or First out Last Out). Thus while one speaks of the top or bottom of a stack, or front or rear 
of a queue, one refers to the right end or left end of a deque. The fact that deque is a generalization 
of a stack or queue is illustrated in Fig. 5.8. 


Push Delete Insert 


BOTTOM TOP FRONT REAR 


DEQUEUE 


Insert Insert 


ai arene Fipa 
I 


FRONT REAR 
Fig. 5.8 A stack, a queue and a deque—a comparison 


A deque has two variants, viz., input restricted deque and output restricted deque. An input 
restricted deque is one where insertions are allowed at one end only while deletions are allowed 
at both ends. On the other hand, an output restricted deque allows insertions at both ends of the 
deque but permits deletions only at one end. 

A deque is commonly implemented as a circular array with two variables LEFT and RIGHT 
taking care of the active ends of the deque. Example 5.4 illustrates the working of a deque with 
insertions and deletions permitted at both ends. 


Example 5.4 Let DEQ[1:6] be a deque implemented as a circular array. The contents of DEQ 
and that of LEFT and RIGHT are as given below: 


The McGraw Hill Companies 


Queues 71 


DEQ: LEFT: 3 RIGHT: 5 
0] RI BI [4 [S] [6] 
R T S 
The following operations demonstrate the working of the deque DEQ which supports insertions 


and deletions at both ends. 
(i) Insert X at the left end and Y at the right end 


DEQ: LEFT: 2 RIGHT: 6 
a L BI Ff [S] [6] 
X R T S Y 
(ii) Delete twice from the right end 
DEQ: LEFT: 2 RIGHT: 4 
1] R BI [4 [S] [6] 
X R T 
(iii) Insert G, Q and M at the left end 
DEQ: LEFT: 5 RIGHT: 4 
a RL BI Ff [S] [6] 
G X R T M Q 
(iv) Insert J at the right end 
Here no insertion is possible since the deque is full. Observe the condition LEFT=RIGHT+1 
when the deque is full. 
Delete twice from the left end 
DEO: LEFT: 1 RIGHT: 4 
G RL BI Ff [S] [6] 
G X R T 


(v 


Ner 


It is easy to observe that for insertions at the left end, LEFT is decremented by 1 ( mod n) and 
for insertions at the right end RIGHT is incremented by 1 (mod n). For deletions at the left end, 
LEFT is incremented by 1 (mod n) and for deletions at the right end, RIGHT is decremented by 
1 (mod n) where n is the capacity of the deque. Again, before performing a deletion if LEFT=RIGHT, 
then it implies that there is only one element and in such a case after deletion set LEFT =RIGHT=NIL 
to indicate that the deque is empty. 





Applications 5.5 


In this section we discuss the application of a linear queue and a priority queue in the scheduling 
of jobs by a processor in a time sharing system. 


Application of a linear queue 


Figure 5.9 shows a basic diagram of a time-sharing system. A CPU (processor) endowed with 
memory resources, is to be shared by n number of computer users. The sharing of the processor 





The McGraw-Hill Companies 


72 Data Structures and Algorithms 


CENTRAL 
TERMINALS PROCESSOR 








Fig. 5.9 A basic diagram of a time-sharing system 


and memory resources is done by allotting a definite time slice of the processor’s attention on the 
users and in a round-robin fashion. In a system such as this, the users are unaware of the presence 
of other users and are led to believe that their job receives the undivided attention of the CPU. 
However, to keep track of the jobs initiated by the users, the processor relies on a queue data 
structure recording the active user ids. Example 5.5 demonstrates the application of a queue data 
structure for this job-scheduling problem. 


Example 5.5 The following is a table of three users A,B,C with their job requests J; (k) where 
i is the job number and k is the time required to execute the job. 


Job requests and the execution time in p secs 


Jy (4), Jo (3) 


Js (2), J4 0), J5 0) 





Thus J, (4), a job request initiated by A needs 4 u secs for its execution before the user initiates 
the next request of J,(3). Throughout the simulation, we assume a uniform user delay period of 
5 u secs between any two sequential job requests initiated by a user. Thus B initiates J, (1), 5 u secs 
after the completion of J, (2) and so on. Also to simplify simulation, we assume that the CPU 
gives whole attention to the completion of a job request before moving to the next job request. 
In other words, all the job requests complete their execution well within the time slice allotted 
to them. To initiate the simulation, we assume that A logged in at time 0, B at time 1 and C at 
time 2. Figure 5.10 shows a graphical illustration of the simulation. Note that at time 2 while A’s 
J, (4) is being executed, B is in the wait mode with J, (2) and C has just logged in. The objective 
is to ensure the CPU’s attention to all the jobs logged in according to the principle of FIFO. 
To tackle such a complex scenario, a queue data structure comes in handy. As soon as a job 
request is made by a user, the user id is inserted into a queue. A job that is to be processed next 





The McGraw-Hill Companies 


Queues 73 








C| RE A ES 
| J (2)! 
B| KX EENE 








5 ee i 
| | | J>(3) 


see MMMM SSS 





TIME —~> 


22534 + Job Execution RR : Job Waiting IIHI : User Delay 
HHH : CPU Busy - CPU Idle 


Fig. 5.10 Time sharing system simulation non-priority based job requests 


would be the one at the head of the queue. A job until its execution is complete remains at the 
head of the queue. Once the request has been processed and execution is complete, the user id 
is deleted from the queue. 

A snap shot of the queue data structure at times 5, 10 and 14 is shown in Fig. 5.11. Observe 
that during the time period 16-21 the CPU is left idle. 


Job Queue 
T Q 


i B } 
dme CL CLD 


A 


i E 
Eme ey GD 


. A E 
umele eg ayy a P17) 


Fig. 5.11 Snapshot of the queue at times 5, 10 and 14 


Application of priority queues 


Assume a time-sharing system in which job requests by users are of different categories. For 
example, some requests may be real time, the others online and the last may be batch processing 
requests. It is known that real time job requests carry the highest priority, followed by online 
processing and batch processing in that order. In such a situation the job scheduler needs to 
maintain a priority queue to execute the job requests based on their priorities. If the priority 
queue were to be implemented using a cluster of queues of varying priorities, the scheduler has 
to maintain one queue for real time jobs (R), one for online processing jobs (O) and the third for 
batch processing jobs (B). The CPU proceeds to execute a job request in O only when R is empty. 
In other words all real time jobs awaiting execution in R have to be completed and cleared before 





The McGraw-Hill Companies 


74 Data Structures and Algorithms 


execution of a job request from O. In the case of queue B, before executing a job in queue B, the 
queues R and O should be empty. Example 5.6 illustrates the application of a priority queue in 
a time-sharing system with priority-based job requests. 


Example 5.6 The following is a table of three users A,B,C with their job requests. R; (k) 
indicates a real time job R; whose execution time is k u secs. Similarly Bk) and O,(k) indicate 
batch processing and online processing jobs respectively. 


Job requests and their execution time in p secs 


Rı (4) Bı (1) 


Oi(2) O3) B2 (3) 
R (1) Bg (2) 0363) 





As before we assume a user delay of 5 u secs between any two sequential job requests by the user 
and assume that the CPU gives undivided attention to a job request until its completion. Also, 
A, B and C login at times 0,1 and 2 respectively. 

Figure 5.12. illustrates the simulation of the job scheduler for the priority based job requests. 
Figure 5.13 shows the snap shot of the priority queue at times 4, 8 and 12. Observe that the 
processor while scheduling jobs and executing them falls into idle modes during time periods 
7-9 and 15-17. 





KXXXXXXXX TS 
TIIRI S SE 
C RRR eee 


i : O; (2) l O2 (3) | 








| Ry (4) | | | Bı (1) | | | | 
| | | | | | | 







TETIT 
CREO MITET 


s 2 WH Rbk I 13 16 11 I% 19 20 21 21 23 


: Job Waiting W, : User Delay 
HHHH :CPU Busy : CPU Idle 


Fig. 5.12 Simulation of the time sharing system for priority based jobs 








The McGraw-Hill Companies 


Queues 75 
At time 4 At time 8 At time 12 
R R R 
4 l 
r E’ 
G | ej 
2 3 
d” S 
B B > B 
B3 
B: Batch Processing Queue R: Real Time Queue O: On-line Priority Queue 


Fig. 5.13 Snapshots of the priority queue at time 4, 8 and 12 


ADT for Queues 


Data objects 
A finite set of elements of the same type 


Operations 
Create an empty queue and initialize front and rear variables of the queue 
CREATE “C “QURUE, FRONT, REAR) 
Check if queue QUEUE is empty 
Cale OUI NEE (ONUIMIUN 9) Bere) IvSenigy Ae Winner, Kein) 





Check 1f queue QUEUE is full 
ChOuULUE EUR mK OUTUh in Peo hoon s a nunc tien) 

Insert ITEM into queue QUEUE 
ENQUEUE OV aiiM) 

Delete element from queue QUEUE and output the element deleted in ITEM 
DEOUE UE Se (OUR UE) = sin iE IM) y 


@ Summary 


> 


> 


A queue data structure is a linear list in which all insertions are made at the rear end of 
the list and deletions are made at the front end of the list. 

A queue follows the principle of FIFO or FCFS and is commonly implemented using 
arrays. It therefore calls for the testing of QUEUE_ FULL/QUEUE_EMPTY conditions 
during insert/delete operations respectively. 

A linear queue suffers from the draw back of QUEUE_FULL condition invocation even 
when the queue in not physically full to its capacity. This limitation is over come to an 
extent in a circular queue. 

Priority queue is a queue structure in which elements are inserted or deleted from a queue 
based on some property known as priority. 

A deque is a double ended queue with insertions and deletions done at either ends or may 
be appropriately restricted at one of the ends. 

The application of queues and priority queues has been demonstrated on the problem of 
job scheduling in time-sharing system environments. 





The McGraw-Hill Companies 


76 Data Structures and Algorithms 


©) Illustrative Problems 


Problem 5.1 Let INITIALISE (Q) be an operation which initializes a linear queue ọ to be 
empty. Let ENQUEUE (Q, ITEM) insert an ITEM into Q and DEQUEUE (Q, ITEM) delete an element 
from Q through ITEM. EMPTY QUEUE (Q) is a Boolean function which is true if o is empty and false 
other wise, and PRINT (ITEM) is a function which displays the value of ITEM. 

What is the output of the following pseudo code? 


ing 3 = ys FZ = 0 9. ENQUEUE (Q,Y+18) 

2. INITIALISE (Q) 10. DEQUEUE (Q, X) 

3. ENQUEUE (Q,10) 11. DEQUEUE (Q, Y) 

4. ENQUEUE (Q, 70) 12. while not EMPTY QUEUE (Q) do 
Ss ENQUEUE (Q, 88) 13. DEQUEUE (Q, X) 

6. DEQUEUE (Q, X) 14. PRINT (X) 

7. DEQUEUE (Q, Z) 15. end 

8. ENQUEUE (Q, X) 


Solution: The contents of the queue o and the values of the variables x, yY, Z are tabulated 
below: 


= 


= 





The output of the program code is : 18 


Problem 5.2 Given g to be a circular queue implemented as an array ©’ [0:4] and using 
procedures declared in problem I = 5.1, but suitable for implementation on Q’, what is the output 
of the following code? Illustrative Problem 5.1 





The McGraw-Hill Companies 


Queues 


77 


[Note: The procedures ENQUEUE (Q’, X) and DEQUEUE (Q’, X) may be assumed to be 
implementation of Algorithms 5.3, 5.4] 


1. INITIALISE 
2. X : = 56 
oa Y = 77 
4. ENQUEUE (Q’, 
5. ENQUEUE (Q’, 
6. ENQUEUE (Q', 
7. DEQUEUE (Q', 
8. ENQUEUE (Q’, 
9. ENQUEUE (Q', 
Solution: 


illustrated below. 


(Q) 


lüs 

Til 

IREE 
X) 13. 
50) 14. 
Y) los 
Y) lOs 
22) il 
X) 


TEETE 


[1] 





ENQUEUE 
Z = 
if 
then 
DEQUEUE 
PRINT 
end } 
else PRINT (~ 


(Oy T) 

X = Y 

(Z = U) 

{while not EMPTY QUEUE (Q’) 
(O’, X) 

(X) 

)i 


Process Complete” 


The contents of the circular queue Q’ [0:4] and the values of the variable x, Y, Z are 


56 56 ava 
Queue full. ENQUEUE (Q’, Y) fails 
(Contd.) 





The McGraw-Hill Companies 


78 Data Structures and Algorithms 


(Contd.) 





Output of the program code: 50 77 22 56 


Problem 5.3 S and Q are a stack and a priority queue of integers respectively. The priority 
of an element C joining the priority queue Q is computed as C mod 3. In other words the priority 
numbers of the elements are either 0 or 1 or 2. Given A, B, C to be integer variables, what is the 
output of the following code? The procedures are similar to those used in Illustrative Problems 
5.1 and 5.2, 1=5.1 and I = 5.2. However, the queue procedures are modified to appropriately work 
on a priority queue. 


1. A = 10 
2. B= 11 
3. C = A+B 
4. while (C < 110) do 
Fa if (C mod 3) = 0 then PUSH (2C) 
6. else ENQUEUE (0,C©) 
Va A = B 
om B =C 
SP C = A + B 
10. end 
11. while not EMPTY STACK (S) do 
12. POP (S,C) 
13. PRINT (©) 
14. end 
15. while not EMPTY QUEUE (Q) do 
16. DEQURUE (Q, ©) 
17. PRINT (C) 
18. end 
Solution: 


sqa | 
= | 


10 11 21 
10 iu 21 





(Contd.) 





The McGraw-Hill Companies 


Queues 79 
(Contd.) 


322 53 850 530) 850) 53 85 2i 
Output: 21 


53 85 o2 
53 85 53 
53 85 85 
Output $4 53) te} 





The final output is: 21 32 53 85 


Problem 5.4 TOKEN isa priority queue for organizing n data items with m priority numbers. 
TOKEN is implemented as a two dimensional array TOKEN[1 : m, 1: p] where p is the maximum 
number of elements with a given priority. Execute the following operations on TOKEN [1 : 3, 
1: 2]. Here INSERT (‘xxx’, m) indicates the insertion of item ‘xxx’ with priority number m and 
DELETE( ) indicates the deletion of the first among the high priority items. 

(i) INSERT(‘not’, 1) 

(ii) INSERT(‘and’, 2) 

(iii) INSERT (‘or’, 2) 
(iv) DELETE( ) 

(v) INSERT(‘equ’, 3); 

Solution: The two dimensional array TOKEN[1:3, 1:2] before the execution of operations is as 


given below: 
TOKEN: [1] [2] 


The McGraw-Hill Companies 


80 Data Structures and Algorithms 


After the execution of operations, TOKEN[1:3, 1:2] is as shown below: 


(i) INSERT (‘not’, 1) 
(ii) INSERT (‘and’, 2) 
(iii) INSERT (‘or’, 2) 


(iv) DELETE ( ) 


Note how ‘not’ which is the first among the 
elements with the highest priority, is deleted 


(v) INSERT(‘equ’, 3); 





Problem 5.5 DEQ[0:4] is an output restricted deque implemented as a circular array and 
LEFT and RIGHT indicate the ends of the deque as shown below. INSERT(‘xx’, [LEFT | RIGHT]) 
indicates the insertion of the data item at the left or right end as the case may be, and DELETE( ) 
deletes the item from the left end only. 

DEQ: LEFT: 2 RIGHT: 5 
[1] RI B] [4 [S] [6 
C1 A4 Y7 N6 


Execute the following insertions and deletions on DEQ: 
(i) INSERT(‘S5’, LEFT) 
(ii) INSERT(‘K9’, RIGHT) 
(iii) DELETE( ) 
(iv) INSERT( V7, LEFT) 
(v) INSERT(‘T5’, LEFT) 
Solution: 
(i) DEQ after the execution of operations (i) INSERT(‘S5’, LEFT) 
(ii) INSERT(‘K9’, RIGHT) 
DEQ: LEFT: 1 RIGHT: 6 
1] [2] [BI [4l [S] [6] 
S5 C1 A4 Y7 N6 K9 
(ii) DEQ after the execution of DELETE( ) 
DEO: LEFT: 2 RIGHT: 6 
1] [2] [BI [4 [S] [6 
C1 A4 Y7 N6 K9 





The McGraw-Hill Companies 


Queues 81 


(iii) DEQ after the execution of operations (iv) INSERT(‘V7’, LEFT) 
(v) INSERT(‘T5’, LEFT) 
DEQ: LEFT: 1 RIGHT: 6 
1] [2] [3] [4] [S] [6] 
V7 C1 A4 Y7 N6 K9 


After the execution of operation INSERT( V7, LEFT), the deque is full. Hence ‘T5’ is not inserted 
into the deque. 


(Q@®) Review Questions 


1. Which among the following properties does not hold good in a queue? 
(i) A queue supports the principle of First come First served. 
(ii) An enqueuing operation shrinks the queue length 
(iii) A dequeuing operation affects the front end of the queue. 
(iv) An enqueuing operation affects the rear end of the queue 
(a) (i) (b) (ii) (c) (iii) (d) (iv) 
2. A linear queue Q is implemented using an array as shown below. The FRONT and REAR 
pointers which point to the physical front and rear of the queue, are also shown. 
FRONT: 2 REAR: 3 
X Y A Z, 5 
[1] [2] [3] [4] [5] 
Execution of the operation ENQUEUE( Q, ‘W’) would yield the FRONT and REAR 
pointers to respectively carry the values shown in 
(a) 2 and 4 (b) 3 and 3 (c) 3 and 4 (d) 2 and 3 
3. For the linear queue shown in Review Question 2 of Chapter 5, execution of the operation 
DEQUEUE(Q, M) where M is an output variable would yield M, FRONT and REAR to 
respectively carry the values 
(a) Z, 2,3 (b) A, 2, 2 (o0 Y, 3,3 (d) A, 2,3 
4. Given the following array implementation of a circular queue, with FRONT and REAR 


pointing to the physical front and rear of the queue, 
FRONT: 3 REAR: 4 


[1] [2] [3] [4] [5] 

Execution of the operations ENQUEUE( Q, ‘H’), ENQUEUE( Q, “T’) done in a sequence 
would result in 

(i) Invoking Queue full condition soon after ENQUEUE( Q, “H’) operation 

(ii) Aborting the ENQUEUE( Q, ‘T”) operation 
(iii) Yielding FRONT = 1 and REAR = 4, after the operations. 

(iv) Yielding FRONT = 3 and REAR =1, after the operations 

(a) (i) (b) (ii) (c) (iii) (d) (iv) 

5. State whether true or false: 

For the following implementation of a queue, where FRONT and REAR point to the 
physical front and rear of the queue, 





The McGraw-Hill Companies 


82 Data Structures and Algorithms 


FRONT: 3 REAR: 5 
X Y A 4 S 


[1] [2] [3] [4] [5] 
Execution of the operation ENQUEUE(Q, ‘C’), 
(i) if Q is a linear queue, would invoke the Queue full condition 
(ii) if Q is a circular queue would abort the enqueuing operation 
(a) (i) true (ii) true (b) (i) true (ii) false (c) (i) false (ii) false (d) (i) false (ii) true 
6. What are the disadvantages of linear queues? 
How do circular queues help overcome the disadvantages of linear queues? 
8. If FRONT and REAR were pointers to the physical front and rear of a linear queue, 
comment on the condition, FRONT = REAR. 
9. If FRONT and REAR were pointers to the physical front and rear of a circular queue, 
comment on the condition, FRONT = REAR. 
10. How are priority queues implemented using a single queue? 
11. The following is a table of five users Tim, Shiv, Kali, Musa and Lobo, with their job requests 
J; (k) where i is the job number and k is the time required to execute the job. The time at 
which the users logged in are also shown in the table. 


= 


Job requests and the execution time in p secs. Login time 


Jy ©), h (4) 
J (3), Ja), J5 A) 


Je (6), Jz (9) 
J86), Jo (1) 
Jo (3), ho ©), Ju (6) 





Throughout the simulation, assume a uniform user delay period of 4 u secs between any 
two sequential job requests initiated by a user. Also to simplify simulation, assume that the 
CPU gives whole attention to the completion of a job request before moving to the next job 
request. Trace a graphical illustration of the simulation to demonstrate a time sharing 
system at work. Show snapshots of the linear queue used by the system, to implement the 
FIFO principle of attending to jobs by the CPU. 

12. For the time sharing system discussed in Review Question 11 of Chapter 5, trace a graphical 
illustration of the simulation assuming that all job requests J; (k) where 7 is even numbered 
have higher priority than those jobs J; (k) where i is odd numbered. Show snapshots of the 
priority queue implementation. 


(=) Programming Assignments 


1. Waiting line simulation in a post office: 
In a post office, a lone postal worker serves a single queue of customers. Every customer 
receives a token # (serial number) as soon as he/she enters the queue. After service, the token 
is returned to the postal worker and the customer leaves the queue. At any point of time 
the worker may want to know how many customers are yet to be served. 





The McGraw-Hill Companies 


Queues 83 


(i) Implement the system using an appropriate queue data structure, simulating a random 
arrival and departure of customers after service completion. 

(ii) If a customer arrives to operate his/her savings account at the post office, then he/she 
is attended to first by permitting him/her to join a special queue. In such a case the 
postal worker attends to them immediately before resuming his/her normal service. 
Modify the system to implement this addition in service. 

. Write a program to maintain a list of items as a circular queue which is implemented using 

an array. Simulate insertions and deletions to the queue and display a graphical 

representation of the queue after every operation. 


(Pn 


(P1) a 2) A; ’ be n elements with 


. Let PQUE be a priority queue data structure and a“, 
priorities p;, (0 < p; < m - 1) 

(i) Implement PQUE using multiple circular queues one for each priority number. 

(ii) Implement PQUE as a two dimensional array ARR_PQUE[1:m, 1:d] where m is the 
number of priority values and d is the maximum number of data items with a given 
priority. 

(iii) Execute insertions and deletions presented in a random sequence. 

. A deque DQUE is to be implemented using a circular one dimensional array of size N. 
Execute procedures to 

(i) Insert and delete elements from DQUE at either ends 

(ii) Implement DQUE as an output restricted deque 

(iii) Implement DQUE as an input restricted deque 

(iv) For the procedures, what are the conditions used for testing DQUE_FULL and 
DQUE_EMPTY? 

. Execute a general data structure which is a deque supporting insertions and deletions at 
both ends but depending on the choice input by the user, functions as a stack or a queue. 


The McGraw-Hill Companies 


CHAPTER 


a 


W 


2 LINKED LISTS 





6.1 Introduction 
6.2 Singly Linked Lists 


In Part I of the book we dealt with arrays, stacks and queues which 6.3 Circularly Linked 

are linear sequential data structures (of these, stacks and queues ete 

have a linked representation as well, which will be discussed in 

Chapter 7) 6.4 Doubly Linked Lists 
In this chapter we detail linear data structures having a linked 6.5 Multiply Linked 

representation. We first list the demerits of the sequential data Lists 


structure before introducing the need for a linked representation. 
Next, the linked data structures of singly linked list, circularly 
linked list, doubly linked list and multiply linked list are 
elaborately presented. Finally, two problems, viz., Polynomial 
addition and Sparse matrix representation, demonstrating the 
application of linked lists are discussed. 


6.6 Applications 


Introduction 6.1 





Drawbacks of sequential data structures 


Arrays are fundamental sequential data structures. Even stacks and queues rely on arrays for their 
representation and implementation. However, arrays or sequential data structures in general, 
suffer from the following drawbacks: 

(i) inefficient implementation of insertion and deletion operations and 

(ii) inefficient use of storage memory. 

Let us consider an array A[1 : 20]. This means a contiguous set of twenty memory locations 
have been made available to accommodate the data elements of A. As shown in Fig. 6.1(a), let us 
suppose the array is partially full. Now, to insert a new element 108 in the position indicated, it 
is not possible to do so without affecting the neighbouring data elements from their positions. 
Methods such as making use of a temporary array (B) to hold the data elements of A with 108 
inserted at the appropriate position or making use of B to hold the data elements of A which 
follow 108, before copying B into A, call for extensive data movement which is computationally 
expensive. Again, attempting to delete 217 from A calls for the use of a temporary array B to hold 
the elements with 217 excluded, before copying B to A. (Fig. 6.1) 


The McGraw-Hill Companies 


Linked Lists 





85 











Array A [1 : 20] Array A [1 : 20] 
f tinemen s : on mi bo = 
[1] [3] [4] [5] [18] [19] [20] } [2] [3] [4] [5] +++ [18] [19][20] 
Insert 108 _— 217 
Method : J} Method : J} 
Array A: Array A: 
[11] ra} 201 |an7} sae) +++ frer] | +] 1a} 201 f27]34e) -++ [rer] | + 
Recopy x , Copy to array B 
B to Copy brary with 217 deleted 
B[1:20] with Recopy 
Array B: 108 inserted Array B: B to A 
alee ere Po np Pol TT 
[3] [4] [5] [6] [19] [20] [1] [2] [3] [4] [17] [18] [19] [20] 
(a) Insertion in a sequential data structure (b) Deletion in a sequential data structure 


Fig. 6.1 Drawbacks of sequential data structures—lInefficient implementation of Insertion/ 
Deletion operations 


With regard to the second drawback of inefficient storage memory management, the need for 
allotting contiguous memory locations for every array declaration is bound to leave fragments of 
free memory space unworthy of allotment for future requests. This eventually may lead to 
inefficient storage management. In fact, fragmentation of memory is a significant problem to be 
reckoned with in computer science. Several methods have been proposed to counteract this 


problem. 
Figure 6.2 shows a simple diagram of a storage memory with fragmentation of free space. 


V//11///} : Free space 


[__] : Reserved space 





Fig. 6.2 Drawbacks of sequential data structures—lInefficient storage memory management 


Note how fragments of free memory space, though put together, can be a huge chunk of free 
space, the lack of contiguity renders them unworthy of accommodating sequential data 


structures. 


Merits of linked data structures 


A linked representation serves to counteract the drawbacks of sequential representation by 


exhibiting the following merits: 
(i) Efficient implementation of insertion and deletion operations. Unlike sequential data 


structures, there is complete absence of data movement of neighbouring elements during 
the execution of these operations. 





The McGraw-Hill Companies 


86 Data Structures and Algorithms 


(ii) Efficient use of storage memory. The operation and management of linked data structures 
are less prone to create memory fragmentation. 

A linked representation of data structure known as a linked list is a collection of nodes. Each 
node is a collection of fields categorized as data items and links. The data item fields hold the 
information content or data to be represented by the node. The link fields hold the addresses of 
the neighbouring nodes or of those nodes which are associated with the given node as dictated 
by the application. 

Figure 6.3 illustrates the general node structure of a linked list. A node is represented by a 
rectangular box and the fields are shown by partitions in the box. Link fields are shown to carry 
arrows to indicate the nodes to which the given node is linked or connected. 


NODE 


DATA DATA DATA | LINK | LINK LINK 
STRUCTURE | ITEM 1 | ITEM 2 ITEM N l 2 M 





«———— DATA ITEM FIELDS —————~ <«<—- LINK FIELDS ——~ 


Fig. 6.3 A general structure of a node in a linked list 


This implies that unlike arrays, no two nodes in a linked list need be physically contiguous. 
All the nodes in a linked list data structure may in fact be strewn across the storage memory 
making effective use of what little space is available to represent a node. However, the link fields 
carry on themselves the onerous responsibility of remembering the addresses of the other 
neighbouring or associated nodes, to keep track of the data elements in the list. 

In programming language parlance, the link fields are referred to as pointers. In this book, 
pointers and link fields will be interchangeably used in several contexts. 

To implement linked lists the following mechanisms are essential: 

(i) A mechanism to frame chunks of memory into nodes with the desired number of data items 

and fields. 
In most programming languages, this mechanism is implemented by making use of a 
‘record’ or ‘structure’ or its look-alikes or even associated structures, to represent the node 
and its fields. 

(ii) A mechanism to determine which nodes are free and which have been allotted for use. 

(iii) A mechanism to obtain nodes from the free storage area or storage pool for use. 
These are fully provided and managed by the system. There is very little that an end user 
or a programmer can do to handle this mechanism by oneself. This is made possible in 
many programming languages by the provision of inbuilt functions which help execute 
requests for a node with the specific fields. In this book, we make use of a function 
GETNODE (X) to implement this mechanism. The GETNODE (X) function allots a node of 
the desired structure and the address of the node viz. X, is returned. In other words, X is 
an output parameter of the function GETNODE (X), whose value is determined and 
returned by the system. 

(iv) A mechanism to return or dispose of nodes from the reserved area or pool to the free area 
after use. 
This is also made possible in many programming languages by providing an in-built 
function which helps return or dispose of the node after use. In this book we make use of 
the function RETURN(X) to implement this mechanism. The RETURN(X) function returns 


The McGraw-Hill Companies 


Linked Lists 87 


a node with address X, from the reserved area of the pool, to the free area of the pool. In 
other words, X is an input parameter of the function, the value of which is to be provided 
by the user. 

Irrespective of the number of data item fields, a linked list is categorized as singly linked list, 
doubly linked list, circularly linked list and multiply linked list based on the number of link fields it 
owns and/or its intrinsic nature. Thus a linked list with a single link field is known as singly linked 
list and the same with a circular connectivity is known as circularly linked list. On the other hand, 
a linked list with two links each pointing to the predecessor and successor of a node is known as a 
doubly linked list and the same with multiple links is known as multiply linked list. The following 
sections discuss these categories of linked lists in detail. 





Singly Linked Lists 6.2 


Representation of a singly linked list 


A singly linked list is a linear data structure, each node of which has one or more data item fields 
(DATA) but only a single link field (LINK). 

Figure 6.4 illustrates an example of a singly linked list and its node structure. Observe that the 
node in the list carries a single link which points to the node representing its immediate successor 
in the list of data elements. 


ee 
(PLS PPE- EPG PPPs 
(a) Singly linked list (b) Structure of the node 
Fig. 6.4 A singly linked list and its node structure 


Every node which is basically a chunk of memory, carries an address. When a set of data 
elements to be used by an application are represented using a linked list, each data element is 
represented by a node. Depending on the information content of the data element, one or more 
data items may be opened in the node. However, in a singly linked list only a single link field 
is used to point to the node which represents its neigbouring element in the list. The last node 
in the linked lists has its link field empty. The empty link field is also referred to as null link or 


in programming language parlance — null pointer. The notations NIL, or a ground symbol (mek ) 


or a zero (0) are commonly used to indicate null links. The entire linked list is kept track of by 
remembering the address of the start node. This is indicated by START in the figure. Obviously 
it is essential that the START pointer is carefully handled, lest it results in losing the entire list. 


Example Consider a list SPACE-MISSION of four data elements as shown in Fig. 6.5(a). 
This logical representation of the list has each node carrying three DATA fields viz., name of the 
space mission, country of origin, the current status of the mission, and a single link pointing to 
the next node. Let us suppose the nodes which house ‘Chandra’, ‘INSAT-3A’ , ‘Mir’ and ‘Planck’ 
have addresses 1001, 16002, 0026 and 8456 respectively. Figure 6.5(b) shows the physical 





The McGraw-Hill Companies 


88 Data Structures and Algorithms 


SPACE—MISSION 
Under- Under- 
De - Under 
Node structure : = 


Name of space | Country of | Status of the Link 
mission origin mission 


(a) Logical representation of SPACE-MISSION 















0026 


SPACE—-MISSION 


1001 1001 


Under- 


8456 


Under 


16002 


: Under- 





(b) Physical representation of SPACE—MISSION 
Fig. 6.5 A singly linked list—its logical and physical representation 


representation of the linked list. Note how the nodes are distributed all over the storage memory 
and not physically contiguous. Also observe how the LINK field of each node remembers the 
address of the node of its logical neighbour. The LINK field of the last node is NIL. The arrows 
in the logical representation represent the addresses of the neighbouring nodes in its physical 
representation. 


Insertion and deletion in a singly linked list 


To implement insertion and deletion in a singly linked list, one needs the two functions 
introduced in Sec. 6.1.2, viz, GETNODE(X) and RETURN(X) respectively. 





The McGraw-Hill Companies 


Linked Lists 89 


Insert operation Given a singly linked list START, to insert a data element ITEM into the list 
to the right of node NODE, (ITEM is to be inserted as the successor of the data element 
represented by node NODE) the steps to be undertaken are given below. Figure 6.6. illustrates the 
logical representation of the insert operation. 


(i) Call GETNODE(X) to obtain a node to accommodate ITEM. Node has address X. 
(ii) Set DATA field of node X to ITEM (i.e.) DATA(X) = ITEM. 
(iii) Set LINK field of node X to point to the original right neighbour of node NODE (i.e.) 
LINK(X) = LINK(NODE). 
(iv) Set LINK field of NODE to point to X (i.e.) LINK(NODE) = X. 
Algorithm 6.1 illustrates a pseudo code procedure for insertion in a singly linked list which 
is non empty. 





START 
List START f NODE LINK (NODE) 
before 
insertion 
of ITEM X = 
Insert 
ITEM to the 
right of NODE 
List START 
after insertion 
of ITEM 





Fig. 6.6 Logical representation of insertion in a singly linked list 


Algorithm 6.1: To insert a data element ITEM in a non empty singly liked list START, to 
the right of node NODE 


Procedure TNO ERTE CO TARTI TEM NODE) 
nascer TIEM to che rigi or node NODE Mim the sist OTART / 
Call GETNODE (X); 
DATA (X) = ITEM; 
LINE a) — LINE (NODE); /~ Node X ponlo to Che original 
right mae onbor Or mode T NODE 


LINK(NODE) = X; 
end INSERT SL. P 


However, during insert operation in a list, it is advisable to test if START pointer is null or non- 
null. If START pointer is null (START = NIL) then the singly linked list is empty and hence the 
insert operation prepares to insert the data as the first node in the list. On the other hand, if 
START pointer is non-null (START # NIL), then the singly linked list is non empty and hence the 
insert operation prepares to insert the data at an appropriate position in the list as specified by 





The McGraw-Hill Companies 


90 Data Structures and Algorithms 


the application. Algorithm 6.1 works on a non empty list. To handle empty lists the algorithm has 
to be appropriately modified as illustrated in Algorithm 6.2. 


Algorithm 6.2: To insert ITEM after node NODE in a singly linked list START 


procedure JOINS Oe C LCTGENISTART NODE F ITEM) 
Ie Insert [TEM as Ghe first node in the ise 1f START 
is NIL. Otherwise insert ITEM after node NODE */ 
Call GETNODE (X); 


DATAM OS = Tag Poo (CmCENES 1010S iene I 7 
if (START = NIL) then 

(IEMIEINIS, A) IS Se hee ales | ENON vee 

START EP >) @©/* insert [TEM as “the first’ ~node 977 
else 

Ci NK 9x) = SE ENK (NODE \ 

NK (NODE YS" — 9X; 1007 ~ sits So noni empey a) Insert TEN 


to the right of node NODE */ 


end INSERT SL GEN. y 


In sheer contrast to an insert operation in a sequential data structure, observe the total absence 
of data movement in the list during insertion of ITEM. The insert operation merely calls for the 
update of two links in the case of a non empty list. 


Example 6.1 In the singly linked list SPACE-MISSION illustrated in Fig. 6.5(a-b), insert the 
following data elements: 


i) | APPOLLO | USA 
(ii) | SOYUZ 4 USSR 


Let us suppose the GETNODE(X) function releases nodes with addresses X = 646 and X = 1187 
to accommodate APPOLLO and SOYUZ 4 details respectively. The insertion of APPOLLO is 
illustrated in Fig. 6.7(a-b) and the insertion of SOYUZ 4 is illustrated in Fig. 6.7(c-d). 


Delete operation Given a singly linked list START, the delete operation can acquire 
various forms such as deletion of anode NODEY next to that of a specific node NODEX, or more 
commonly deletion of a particular element in a list and so on. We now illustrate the deletion of 
a node which is the successor of node NODEX. 


The steps for the deletion of a node next to that of NODEX in a singly linked START is given 
below. Figure 6.8 illustrates the logical representation of the delete operation. 
(i) Set TEMP a temporary variable to point to the right neighbour of NODEX 
(i.e.) TEMP = LINK(NODEX). The node pointed to by TEMP is to be deleted. 
(ii) Set LINK field of node NODEX to point to the right neighbour of TEMP 
(i.e.) LINK(NODEX) = LINK(TEMP). 
(iii) Dispose node TEMP (i.e.) RETURN (TEMP). 


The McGraw-Hill Companies 


Linked Lists 91 
SPACE- SPACE—MISSION 
MISSION 
istbefore oad - - -| 4 MSAT |- eeM [+ «| -t->]pcanck] | - 
insertion of 3A 
APPOLLO i + 


X SPACE—MISSION 


ace ag a ed pg hn gc 


SPACE- SPACE—MISSION 


MISSION kT 





mene [mrono [usa] am [ 
insertion of 
APPOLLO 
(a) Insert APPOLLO in list SPACE-MISSION—logical representation 
SPACE-MISSION : | 1001 
1001 16002 0026 8456 
SPACE- INSAT- 
list before 
insertion of 646 


SPACE—MISSION : 






646 1001 16002 
SPACE- 
list after 
insertion of 
APPOLLO 0026 8456 


Dar 


(b) Insert APPOLLO in list SPACE-MISSION—physical representation 


nase] NE 
SPACE—MISSION 
SPACE- a 
bq ao pe [Efi] 





list before 
insertion of 
SOYUZ4 
PLANCK foe SOYUZ4| USSR Landed | | 
SPACE-MISSION t 
SPACE- a 
ow Sarco] [ffs TY] 
list after 





insertion of 


SOYUZ4 
PLANCK lB J SOYUZ4 EEE 


(c) Insert SOYUZ4 in list SPACE-MISSION—logical representation 


The McGraw-Hill Companies 


92 Data Structures and Algorithms 





SPACE-—MISSION : | 646 


646 1001 16002 0026 
SPACE- INSAT- 


list before 
8456 1187 


insertion of 


SPACE-MISSION : 


646 1001 16002 0026 
SPACE- ! INSAT- 
i [o i a] po e] [PSST] [sm s 
list after 
8456 1187 


insertion of 
SOYUZ4 PLANCK Ea 1187 SOYUZ4| USSR 


(d) Insert SOYUZ4 in list SPACE-MISSION—physical representation 
Fig. 6.7 Insertion of APPOLLO and SOYUZ4 in the SPACE—MISSION list shown in Fig. 6.5(a-d) 


Algorithm 6.3 illustrates a pseudo-code procedure for the deletion of a node which occurs to 
the right of a node NODEX in a singly linked list START. However, as always, it needs to be 
ensured that the delete operation is not undertaken over an empty list. Hence it is essential to 
check if START is empty. 


Algorithm 6.3: Deletion of a node which is to the right of node NODEX in a singly linked 
list START 


Procedure DECETE oli olan) NODES) 
if (START = NIL) then 
Call ABANDON DELETE; 
J NSANVOK, DIMI, icemmimeres cies Celece Coeeaciom ~/ 


else 

{TEMP = LINK (NODEX); 

LINK (NODEX) = LINK (TEMP); 

Call RETURN (TEMP); } y 
end DETEKTE Sb 


Observe how in contrast to deletion in a sequential data structure which involves data 
movement, the deletion of a node in a linked list merely calls for the update of a single link. 
Example 6.2 illustrates deletion of a node in a singly linked list. 


Example 6.2 For the SPACE-MISSION list shown in Fig. 6.5(a-b) undertake the following 
deletions: 
(i) Delete CHANDRA 
(ii) Delete PLANCK 
The deletion of CHANDRA is illustrated in Fig. 6.9(a-b) and that of PLANCK is illustrated in 
Fig. 6.9 (c-d). 


The McGraw Hill Companies 





Linked Lists 93 
‘| LINK (NODEX) 
+ 
E NODEX LINK (TEMP) 
Original 
linked list 
START 
Singly linked 
list after 
deletion of L 
right node of 
NODEX 


Fig. 6.8 Logical representation of deletion in a singly linked list 





Circularly Linked Lists 6.3 


Representation 


A normal singly linked list has its last node carrying a null pointer. For further improvement in 
processing one may replace the null pointer in the last node with the address of the first node 
in the list. Such a list is called as a circularly linked list or a circular linked list or simply a 
circular list. Figure 6.10 illustrates the representation of a circular list. 


Advantages of circularly linked lists over singly linked lists 


(i) The most important advantage pertains to the accessibility of a node. One can access any 
node from a given node due to the circular movement permitted by the links. One has to 
merely loop through the links to reach a specific node from a given node. 

(ii) The second advantage pertains to delete operations. Recall that for deletion of a node X in 
a singly linked list, the address of the preceding node (for example node Y) is essential, to 
enable, update the LINK field of Y to point to the successor of node X. This necessity arises 
from the fact that in a singly linked list, one cannot access a node’s predecessor due to the 
‘forward’ movement of the links. In other words, LINK fields in a singly linked list point 
to successors and not predecessors. 

However, in the case of a circular list, to delete node X one need not specify the predecessor. 
It can be easily determined by a simple ‘circular’ search through the list before deletion of 
node X. 

(iii) The third advantage is the relative efficiency in the implementation of list based operations 

such as concatenation of two lists, erasing a whole list, splitting a list into parts and so on. 


Disadvantages of circularly linked lists 


The only disadvantage of circularly linked lists is that during processing one has to make sure 
that one does not get into an infinite loop owing to the circular nature of pointers in the list. This 
is liable to occur owing to the absence of a node which will help point out the end of the list and 
thereby terminate processing. 


94 





The McGraw-Hill Companies 


Data Structures and Algorithms 


SPACE—MISSION 


SPACE-MISSION INSAT- 

list before deletion CHANDRA Eee MIR PLANCK 

of CHANDRA L 
SPACE-MISSION 


SPACE-MISSION INSAT. 
list after deletion MIR PLANCK 
of CHANDRA 


C T 

Eo xt | 

--~>%y| CHANDRA) * 3{ +4- - 
a tele ah 


SPACE-—MISSION 
(a) Delete CHANDRA from list SPACE—MISSION—logical representation 


SPACE-MISSION : 


1001 16002 0026 8456 

SPACE—MISSION INSAT- 
ist before insertion [CHANDRA] «| 16002 | | 'NSAT|. ./ 0026] [mir |- +] 8456] | PLANCK | + «| NIL 
of CHANDRA 

SPACE-—MISSION : | 16002 

16002 0026 8456 
SPACE-MISSION INSAT- 
of CHANDRA 


(b) Delete CHANDRA from list SPACE—MISSION—physical representation 


SPACE—MISSION 


SPACE-—MISSION INSAT- 
list before deletion MIR PLANCK 


of PLANCK 
SPACE-MISSION INSAT- ae a 

list after deletion inii 7 > PLANCK |- « | 
of PLANCK to 7 7 


(c) Delete PLANCK from list SPACE—MISSION—logical representation 


SPACE-MISSION : | 16002 


16002 0026 8456 
SPACE-MISSION _[ INSAT- 


of PLANCK 
SPACE-MISSION : | 16002 
16002 0026 


SPACE—MISSION INSAT- 
list after deletion INSAT-] «0026 
of PLANCK 
(d) Delete PLANCK from list SPACE—~MISSION—physical representation 
Fig. 6.9 Deletion of CHANDRA and PLANCK from the SPACE—MISSION list 





The McGraw-Hill Companies 


Linked Lists 95 


Fig. 6.10 Representation of a circular list 


A solution to this problem is to designate a special node to act as the head of the list. This 
node, known as list head or head node has its advantages other than pointing to the beginning 
of a list. The list can never be empty and represented by a ‘hanging’ pointer (START = NIL) as 
was the case with empty singly linked lists. The condition for an empty circular list becomes 
(LINK(HEAD) = HEAD), where HEAD points to the head node of the list. Such a circular list is 
known as a headed circularly linked list or simply circularly linked list with head node. 
Figure 6.11 illustrates the representation of a headed circularly linked list. 


| HEAD | HEAD 


DATA LINK 





DATA LINK 





(a) Non empty list (b) Empty list 
Fig. 6.11 A headed circularly linked list 


Though the head node has the same structure as the other nodes in the list, the DATA field 
of the node is unused and is indicated as a shaded field in the pictorial representation. However, 
in practical applications these fields may be utilized to represent any useful information about the 
list relevant to the application, provided they are deftly handled and do not create confusion 
during the processing of the nodes. 

Example 6.3 illustrates the functioning of circularly linked lists. 


Example 6.3 Let CARS be a headed circularly linked list of four data elements as shown in 
Fig. 6.12(a). To insert MARUTI into the list CARS, the sequence of steps to be undertaken are as 
shown in Fig. 6.12(b-d). To delete FORD from the list CARS shown in Fig. 6.13(a) the sequence 
of steps to be undertaken are shown in Fig. 6.13(b-d). 


Primitive operations on circularly linked lists 


Some of the important primitive operations executed on a circularly linked list are detailed below. 
Here P is a circularly linked list as illustrated in Fig. 6.14(a). 
(i) Insert an element A as the left most element in the list represented by P. 
The sequence of operations to execute the insertion is: 


Call GETNODE (X); 


DATA (X) = A; 
LINK (X) = LINK(P); 
LINK (P) = X; 


Figure 6.14(b) illustrates the insertion of A as the left most element in the circular list P. 
(ii) Insert an element A as the right most element in the list represented by P. 





The McGraw-Hill Companies 


Data Structures and Algorithms 


SANTRO - 


CARS 
CHRYSLER - FORD - 


(a) The headed circularly linked list CARS 
X 


MARUTI | | GETNODE (X) 


(b) Get new node X and store ‘MARUTI into it 





| PREVIOUS 


Ny 
WML 


(c) Obtain the address of the preceding node (PREVIOUS) to insert node_X into the list CARS 






SANTRO - 





x. LINK (PREVIOUS) 


VW) | CHRYSLER E FORD )-}--4---> SANTRO E 
f X 
MARUTI E 


PREVIOUS 
LINK X = LINK (PREVIOUS) 
LINK (PREVIOUS) = X 















(d) Set / Reset links to insert MARUTI into the list CARS 
Fig. 6.12 /nsertion of MARUTI into the headed circularly linked list CARS 


The sequence of operations to execute the insertion are the same as that of inserting A as 
the left most element in the list followed by the instruction. 


P= x 


Figure 6.14(c) illustrates the insertion of A as the right most element in list P. 
(iii) Set Y to the data of the left most node in the list P and delete the node. 
The sequence of operations to execute the deletion are: 


PTR = LINK(P); 
Y = DATA (PTR); 
LINK (P) = LINK(PTR); 


Call RETURN (PTR); 


Here PTR is a temporary pointer variable. Figure 6.14(d) illustrates the deletion of the left 


most node in the list P, setting Y to its data. 


Observe that the primitive operations (i) and (iii) when combined, results in the circularly 


linked list working as a stack and operations (ii) and (iii) when combined, results in the circularly 
linked list working as a queue. 





The McGraw-Hill Companies 


Linked Lists 97 


CARS 
k> 


YYW CHRYSLER E FORD E MARUTI E 





SANTRO p 





(a) The headed circularly linked list CARS 


— a n LINK =. 


CHRYSLER | — FORD | + MARUTI | - 


(b) Obtain the address of the node containing FORD (node HERE) by searching for it in the list 
CARS and its predecessor node (node PREVIOUS) 









SANTRO 


oe | PREVIOUS | HERE LINK (HERE) 


CHRYSLER -> ->| FORD | - MARUTI E SANTRO t 


LINK (PREVIOUS) = LINK (HERE 
(c) Reset links to delete FORD ( ) ( ) 






oe = 


ZZ Jes] CHRYSLER HRYSLER | — MARUTI | - SANTRO E 





(d) Dispose node HERE RETURN (HERE) 


Fig. 6.13 Deletion of FORD from the headed circularly linked list CARS 


Other operations on circularly linked lists 


The concatenation of two circularly linked lists L,, L, as illustrated in Fig. 6.15 has the following 
sequence of instructions. 


if L, # NIL then 
{ if L, + NIL then 


{TEMP = LINK (L) 
LINK(L,) = LINK(L,) 
LINK(L,) = TEMP 

= 2) 


The other operations are splitting a list into two parts (Programming Assignment P6.2) and 
erasing a list. 


The McGraw-Hill Companies 


98 Data Structures and Algorithms 


LINK (P) 





(a) Circularly linked list P 





Y = DATA (PTR) 7 


(d) Delete leftmost node in list P and set Y to the data of the node 


z5 wa —» : Deleted links 


Fig. 6.14 Some primitive operations on a circularly linked list P 


Doubly Linked Lists 6.4 


In Secs 6.2 and 6.3 we discussed two types of linked representations viz., singly linked list and 
circularly linked list, both making use of a single link. Also, the circularly linked list served to 
rectify the drawbacks of the singly linked list. To enhance greater flexibility of movement, the 
linked representation could include two links in every node, each of which points to the nodes 
on either side of the given node. Such a linked representation known as doubly linked list is 
discussed in this section. 


Representation of a doubly linked list 


A doubly linked list is a linked linear data structure, each node of which has one or more data 





The McGraw-Hill Companies 


Linked Lists 99 


(a) Circularly linked lists L;, Lə before concatenation operation 
y Zi Tia 





(b) Circularly linked lists L4, L2 after concatenation operation m -£ — Deleted 
links 
—— > Links after 
concatenation 


Fig. 6.15 Concatenation of two circularly linked lists 


fields but only two link fields termed left link (LLINK) and right link (RLINK). The LLINK field 
of a given node points to the node on its left and its RLINK field points to the one on its right. 
A doubly linked list may or may not have a head node. Again, it may or may not be circular. 
Figure 6.16 illustrates the structure of a node in a doubly linked list and the various types of 
lists. 
Example 6.4 illustrates a doubly linked list and its logical and physical representations. 


Example 6.4 Consider a list FLOWERS of four data elements LOTUS, CHRYSANTHEMUM, 
LILY and TULIP stored as a circular doubly linked list with a head node. The logical and physical 
representation of FLOWERS has been illustrated in Fig. 6.17 (a-b). Observe how the LLINK and 
RLINK fields store the addresses of the predecessors and successors of the given node respectively. 
In the case of FLOWERS being an empty list, the representation is as shown in Fig. 6.17 (c-d) 


Advantages and disadvantages of a doubly linked list 


Doubly linked lists have the following advantages: 
(i) The availability of two links LLINK and RLINK permit forward and backward movement 
during the processing of the list. 

(ii) The deletion of a node X from the list calls only for the value X to be known. Contrast how 
in the case of a singly linked or circularly linked list, the delete operation necessarily needs 
to know the predecessor of the node to be deleted. While a singly linked list expects the 
predecessor of the node to be deleted, to be explicitly known, a circularly linked list is 


The McGraw-Hill Companies 


100 Data Structures and Algorithms 





Node structure of a doubly linked list 


LLINK | DATA | RLINK 


Simple doubly linked list 
START 


ELLELE eh 


Circular doubly linked list 


START 


A LI ALEL EAE 


Circular doubly linked with a head node 






A A HI Abel Adi 


HEAD NODE 






Fig. 6.16 Node structure of a doubly linked list and the various list types 


endowed with the capability to move round the list to find the predecessor node. However, 
in the latter case, if the list is too long it may render the delete operation inefficient. 

The only disadvantage of the doubly linked list is its memory requirement. That each node 
needs two links could be considered expensive storage-wise, when compared to singly linked 
lists or circular lists. Nevertheless, the efficiency of operations due to the availability of two links 
more than compensate for the extra space requirement. 


Operations on doubly linked lists 


An insert and delete operation on a doubly linked list are detailed here. 


Insert Operation Let P be a headed circular doubly linked list which is non empty. 
Algorithm 6.4 illustrates the insertion of a node X to the right of node Y. Figure 6.18(a) shows the 
logical representation of list P before and after insertion. 





The McGraw-Hill Companies 


Linked Lists 101 





FLOWERS 


t 
A EO OT ee] O 
el 


(a) Logical representation of a circular doubly linked list with a head node FLOWERS 





110 0016 1014 078 


902 Y 0016 LOTUS] 1014 0016 | CHRYSANTHEMUM 1014 LILY] 962 | 
A 
962 
FLOWERS : 
078 | TULIP 110 


(b) Physical representation of a circular doubly linked list with a head node (FLOWERS) 


FLOWERS 


(c) Logical representation of an empty circular doubly linked list with a head node (FLOWERS) 


uo iY no) FLOWERS: | 110 


(d) Physical representation of an empty circular doubly linked list with a head node 


Fig. 6.17 The logical and physical representation of a circular doubly linked list with a head node, 


FLOWERS 
Algorithm 6.4: To insert node X to the right of node Y in a headed circular doubly linked 
list P 
Procedure INSERT DL (X, Y) 
LLINK (X) = Y; 
RLINK (X) = RLINK (Y); 
LLINK (RLINK (Y)) = X; 
( 


RLINK 5) =X] 
end INSERT DL. 


Note how the four instructions in the Algorithm 6.4 correspond to the setting / resetting of the 


four link fields, viz., links pertaining to node Y, its original right neighbour (RLINK (Y)) and the 
node X. 


Delete operation Let P be a headed, circular doubly linked list. Algorithm 6.5 illustrates the 
deletion of a node X from P. The condition (X = P) that is checked ensures that the head node P 


is not deleted. Figure 6.18(b) shows the logical representation of list P before and after the 
deletion of node X from the list P. 





The McGraw-Hill Companies 


102 


Algorithm 6.5: Delete node X from a headed circular doubly linked list P 


procedure DELETE DL(P, X) 
if (xX = P) then ABANDON DELETE; 
else 
ei NK GE EN (%9) = RLINK (X); 
DEEN (REN (%9) = LLINK (X); 
Call RETURN (X); } 
end DECETERJDE 





Data Structures and Algorithms 


Note how the two instructions pertaining to links, in Algorithm 6.5, correspond to the setting / 
resetting of link fields of the two nodes viz. the predecessor (LLINK (X)) and successor (RLINK 


(X)) of node X. 
Example 6.5 illustrates the insert/delete operation on a doubly linked list PLANET. 


List P ‘cena 5, 


before 
insertion 
of node 





P 
List P 
after 
insertion 
of node 





(a) Insertion of node_X into a headed circular doubly linked list P, after node Y 


LLINK (X) RLINK (Xx) 


List P v 
e OZE STO -E 


deletion 


List P 


a ZAO LESTE 


deletion 





(b) Deletion of node X from a headed circular doubly linked list P 
Fig. 6.18 /nsertion/deletion in a headed circular doubly linked list 





The McGraw-Hill Companies 


Linked Lists 103 


Example 6.5 Let PLANET be a headed circular doubly linked list with three data elements 
viz. MARS, PLUTO and URANUS. Figure 6.19 illustrates the logical and physical representation 
of the list PLANET. Figure 6.20(a) illustrates the logical and physical representation of list PLANET 
after the deletion of PLUTO and Fig. 6.20(b) the same after insertion of JUPITER. 


Logical representation of list PLANET 


PLANET 


F 
A KA [mas] KA froo) K [uranus] 


Physical representation of list PLANET 


101 876 344 2112 
7 E nnl, ae a 7 
2112 JJ 816 101 MARS 876 | PLUTO) 2112 | URANUS | 101 
PLANET: |101 


Fig. 6.19 Logical and physical representation of list PLANET 


Multiply Linked Lists 6.5 


A multiply linked list as its name suggests is a linked representation with multiple data and link 
fields. A general node structure of a multiply linked list is as shown in Fig. 6.21. 

Since each link field connects a group of nodes representing the data elements of a global list 
L, the multiply linked representation of the list L is a network of nodes which are connected to 
one another based on some association. The link fields may or may not render their respective 
lists to be circular or may or may not posses a head node. 

Example 6.6 illustrates an example of a multiply linked list. 


Example 6.6 Let STUDENT be a multiply linked list representation whose node structure 
is as shown in Fig. 6.22. Here, SPORTS-CLUB-MEM link field links all student nodes who are 
members of the sports club. DEPT-ENROLL links all students enrolled with a given department 
and DAY-STUDENT links all students enrolled as day students. 

Consider Table 6.1 illustrating details pertaining to 6 students. 


Table 6.1 Student details for representation as a multiply linked list 


Name of the Number of Sports Day Department 
Student Credits Club Student 
Registered Membership 


(Contd.) 








The McGraw-Hill Companies 


104 Data Structures and Algorithms 


Logical representation of list PLANET after deletion of PLUTO 


PLANET 








101 876 344 2112 
r ios & ae ease Fas 
Z at ee pei ey 
PLANET : 


(a) Logical and physical representation of list PLANET after deletion of PLUTO 
Logical representation of list PLANET after insertion of JUPITER 






PLANET 


— ero 
geet E 


Physical representation of list PLANET after insertion of JUPITER 


101 767 876 2112 


PLANET : 


(b) Logical and physical representation of list PLANET after insertion of JUPITER 
Fig. 6.20 Deletion of PLUTO and insertion of JUPITER in list PLANET 
LINK 2 LINK m- 1 


LINK m 





DATA 1 DATA 1 ++ * DATAX LINK 1 


LINK 4 
LINK 3 
Fig. 6.21 The node structure of a multiply linked list 





The McGraw-Hill Companies 


Linked Lists 105 


ROLL # SPORTS—CLUB-MEM. 


DEPT—ENROLL 





NAME OF NAME OF DAY_STUDENT 
THE STUDENT CREDITS 
REGISTERED 


Fig. 6.22 Node structure of the multiply linked list STUDENT 


: 
| 
: 
` 
Communication Engg. 


The multiply linked structure of the data elements in Table 6.1 is shown in Fig. 6.23. Here S 
is a singly linked list of all sports club members and D the singly linked list of all day students. 
Note how the DEPT-ENROLL link field maintains individual singly linked lists COMP-SC, MECH- 
SC, CIVIL-ENGG and ECE to keep track of the students enrolled with the respective departments. 
To insert a new node with the following details, 


into the list STUDENTS, the procedure is similar to that of insertion in singly linked lists. The 
point of insertion is to be determined by the user. The resultant list is shown in Fig. 6.24. Here 
we have inserted ALI in the alphabetical order of students enrolled with the computer science 
department. 

To delete REBECCA from the list of sports club members of the multiply linked list STUDENT, 
we undertake a sequence of operations as shown in Fig. 6.25. Observe how the node for 
REBECCA continues to participate in the other lists despite its deletion from the list S. 

A multiply linked list can be designed to accommodate a lot of flexibility with respect to its 
links depending on the needs and suitability of the application. 


(Contd.) 


Y 
N 
N 
Y 





Applications 6.6 


In this section we discuss two applications of linked lists viz., 





The McGraw-Hill Companies 


106 Data Structures and Algorithms 


S COMP-SC MECH-SC 


D 
t ý g 
Poe fee Emn Mh 







smon [meo [0p oa) ||| fesser cusr Tio Pee 


CIVIL-ENGG nA 


p 
sm [osa [0 a p a | eerca eo | 220] + Py J 


SCM: SPORTS-CLUB-MEM link field CIVIL-ENGG: List of students enrolled 
DE: DEPT ENROLL link field with the Dept. of Civil Engg. 
DS: DAY-STUDENT link field MECH-SC: List of students enrolled with 

S: List of sports club members the Dept. of Mech Sc. 

D: List of day students ECE: List of students enrolled with the 
Comp. SC: List of students enrolled with the the Dept. of E.C.E. 


Dept. of Computer Science 


Fig. 6.23 Multiply linked list structure of list STUDENT 


(i) Addition of polynomials and 
(ii) Representation of a sparse matrix 
Addition of polynomials is illustrative of application of singly linked lists and sparse matrix 
representation that of multiply linked lists. 


Addition of polynomials 


The objective of this application is to perform a symbolic addition of two polynomials as 
illustrated below: 
Let Poara 4 5x4 4.and 
P, : 7x + 8x° — 9x3 + 10x? + 14 
be two polynomials over a variable x. The objective is to obtain the algebraic sum of P}, P, (i.e.) 
P, + P, as, 
Pi Pa Ono Tar = 8x? t 0r tr 18 
To perform this symbolic manipulation of the polynomials, we make use of a singly linked list 
to represent each polynomial. The node structure and the singly linked list representation for the 
two polynomials are given in Fig. 6.26. Here each node in the singly linked list represents a term 
of the polynomial. 
To add the two polynomials, we presume that the singly linked lists have their nodes arranged 
in the decreasing order of the exponents of the variable x. 





The McGraw-Hill Companies 


Linked Lists 107 


COMP-SC 


S D 
\ ff 
roles G eeeh 


SCIM |DE |DS 


SEI 





swan] mess] 200] +] fq] || | [vasse] comer wo] [a | 


CIVIL- ECE 


ENGG 


SITA | CE544 Hann REBECCA | EC424 Hagn 


SCM: SPORTS-CLUB-MEM link field 
DE: DEPT-ENROLL link field 
DS: DAY-STUDENT link field 


Fig. 6.24 Insert ALI into the multiply linked list STUDENT 
5, COMP-SC MECH-SC 


son] [1 Ty monni 
SC DS i 


ostos | 200 | y IR t 
E 
smon [meo |210|; efe] || [rasse eer [oo fp Ta] 


CIVIL-ENGG ECE 


sma [osa [0e] | {aesecca] ece | 20] 7 e 





Fig. 6.25 Delete REBECCA from the sports club membership list of the multiply linked list 
STUDENTS 


The McGraw-Hill Companies 


108 Data Structures and Algorithms 





COEFF : Coefficient of 
variable x in the 
term 

EXPONENT : Exponent of 


COEFF EXPONENT LINK variable x in the 
term 


(a) Node structure of a term in a polynomial over a single variable x 


av 
2|s| 7> [1]3| 4> [511] -+ [4] | = 
2x6 + x3 + 5x + 4 — 


SEEE “lof = 
14 = 


7x6 8x5 + (—9x3) + 10x? 
(b) Singly linked list representation of polynomials P4, P2 


Fig. 6.26 Addition of polynomials—Node structure and singly linked list representation of 
polynomials 


The objective is to create a new list of nodes representing the sum P} + P,. This is achieved by 
adding the COEFF fields of the nodes of like powers of variable x in lists P, F, and adding a new 
node reflecting this operation in the resultant list P, + P,. We present below, the crux of the 
procedure: 

Here P, P, are the start pointers of the singly linked lists representing the polynomials P, P». 
Also PTR1 and PTR2 are two temporary pointers initially set to P) and FP, respectively. 


if (EXPONENT (PTR1) = EXPONENT(PTR2)) then /* PTRI and PTR2 
point to like terms */ 
if (COEFF (PTR1) + COEFF(PTR2)) #4 0 then 
{Call GETNODE (X); /* Perform the addition of terms and 
include the result node as the last 
mode Of diat R $ iy 7 


COEFF {X) = COEFF {PTRIL) + COEFF (PITRZ); 
EXPONENT (X) = EXPONENT (PTR1); /*or EXPONENT (PTR2) */ 
LINK (X) = NIL; 


Add node Xx as the läst mode of the List A + P- 
} 
if (EXPONENT (PTR1) < EXPONENT (PTR2)) then 
/* PTR1 and PTR2 do not point to like terms */ 
/* Duplicate the node representing the highest 
power(i.e.) EXPONENT (PTR2) and insert it as 
the Jase node in Pe + Py "y 
{ Call GETNODE (X); 
COEFF (X) = COEFF (PTR2); 
EXPONENT (X) = EXPONENT (PTR2); 





The McGraw-Hill Companies 


Linked Lists 109 


LINK (X) = NIL; 

Add node X as the last node of list Po 7 P 

} 

if (EXPONENT (PTR1) > EXPONENT (PTR2)) then 

/* PTRI and PTR2 do not point to like terms. Hence duplicate 
the node representing the highest power (1i.e.) EXPONENT 
(PIRI) and anser 26 ae: Cie lant, mode of 2; + Fy 77 

{ Call GETNODE (X); 


2 7 


COEFF (X) = COEFF (PTRI1); 
EXPONENT (X) = EXPONENT (PTR1); 
LINK (X) = NIL; 


Add node X as the last node of list P) + Py 
} 


If any one of the lists during the course of addition of terms has exhausted its nodes earlier 
than the other list, then the nodes of the other list are simply appended to list P, + P, in the order 
of their occurrence in their original list. 

In case of polynomials of two variables x, y or three variables x, y, z the node structures are 
as shown in Fig. 6.27. 


ne s mi 
iii aa 
EXPONENT X LINK 
COEFFICIENT EXPONENT Y COEFFICIENT LINK 
(a) Node structure of a polynomial in two variables (b) Node structure of a polynomial in three variables 


Fig. 6.27 Node structures of polynomials in two/three variables 


Here COEFFICIENT refers to the coefficient of the term in the polynomial represented by the 
node. EXPONENT X, EXPONENT Y and EXPONENT Z are the exponents of the variables x, y 
and z respectively. 


Sparse matrix representation 


The concept of sparse matrices was discussed in Chapter 3. An array representation for the 
efficient representation and manipulation of sparse matrices was suggested in Sec. 3.5. In this 
section we present a linked representation for the sparse matrix, as an illustration of multiply 
linked list. 

Consider a sparse matrix shown in Fig. 6.28(a). The node structure for the linked 
representation of the sparse matrix is shown in Fig. 6.28(b). Each non-zero element of the matrix 
is represented using the node structure. Here ROW, COL and DATA fields record the row, 
column and value of the non-zero element in the matrix. The RIGHT link points to the node 





The McGraw-Hill Companies 


110 Data Structures and Algorithms 

0 1 0 0 0 =O 
0 0 0 0 0 O 

2 0 0 1 0 0 
0 0 0 0 0 0 ROW COL _ DATA 
03 0 0 0 0 RIGHT 
0 0 0 0 0 O 
0 0 0 0 0 | 

DOWN 
(a) Sparse matrix (b) Node structure of the multiply linked list 


Fig. 6.28 A sparse matrix and the node structure for its representation as a multiply linked list 


holding the next non-zero value in the same row of the matrix. DOWN link points to the node 
holding the next non-zero value in the same column of the matrix. Thus, each non-zero value is 
linked to its row wise and column wise non-zero neighbour. Thus, the linked representation 
ignores representing the zeros in the matrix. Now each of the fields connect together to form a 
singly linked list with a head node. Thus all the nodes representing non-zero elements of a row 
in the matrix link themselves (through RIGHT LINK) to form a singly linked list with a head 
node. The number of such lists is equal to the number of rows in the matrix which contain at least 
one non-zero element. Similarly, all the nodes representing the non-zero elements of a column in 
the matrix link themselves (through DOWN LINK) to form a singly linked list with a head node. 
The number of such lists is equal to the number of columns in the matrix which contain at least 
one non-zero element. All the head nodes are also linked together to form a singly linked list. The 
head nodes of the row lists have their COL fields to be zero and the head nodes of the column 
lists have their ROW fields to be zero. The head node of all head nodes, indicated by START, 
stores the dimension of the original matrix in its ROW, COL fields. Figure 6.29 shows the 
multiply linked list representation of the sparse matrix shown in Fig. 6.28(a). 


ADT for Links 


Data objects: 
Addresses of the nodes holding data and null links 
Operations: 
e Allocate node (address X) from Available Space to accommodate data 
GETNODE (X) 
Return node (address X) after use to Available Space RETURN(X) 
@ Store a value of one link variable LINK1 to another link variable LINK2 
STORE_LINK (LINK1, LINK2) 
Store ITEM into a node whose address is X 
STORE_DATA (X, ITEM) 
@ Retrieve ITEM from a node whose address is X 
RETRIEVE_DATA (X, ITEM) y 





The McGraw-Hill Companies 


Linked Lists 111 


Start 





Fig. 6.29 Multiply linked representation of the sparse matrix shown in Fig. 6.28(a) 


ADT for Singly Linked Lists 


Data objects: 
A list of nodes each holding one (or more) data field(s) DATA and a single link 
field LINK. LIST points to the start node of the list. 
Operations: 
e Check if list LIST is empty 
CHECK LIST_EMPTY ( LIST) (Boolean function) 
Insert ITEM into the list LIST as the first element 
INSERT_FIRST (LIST, ITEM) 
Insert ITEM into the list LIST as the last element 
INSERT_LAST (LIST, ITEM) 
Insert ITEM into the list LIST in order 
INSERT_ORDER (LIST, ITEM) 
Delete the first node from the list LIST 
DELETE_FIRST (LIST) 
Delete the last node from the list LIST 
DELETE_LAST (LIST) 





112 








The McGraw-Hill Companies 


Data Structures and Algorithms 


Delete ITEM from the list LIST 
DELETE_ELEMENT (LIST, ITEM) 
e Advance Link to traverse down the list 
ADVANCE _LINK (LINK) 
Store ITEM into a node whose address is X 
STORE_DATA (X, ITEM) 
Retrieve data of a node whose address is X and return it in ITEM 
RETRIEVE_DATA (X, ITEM) 
e Retrieve link of a node whose address is X and return the value in LINK1 
RETRIEVE_LINK (X, LINKI) Wa 


Summary 


> 


> 


Sequential data structures suffer from the draw backs of inefficient implementation of 
insert/delete operations and inefficient use of memory. 


A linked representation serves to rectify these drawbacks. However, it calls for the 
implementation of mechanisms such as GETNODE(X) and RETURN(X) to reserve a node 
for use and return the same to the free pool after use, respectively. 


A singly linked list is the simplest of a linked representation with one or more data fields 
but with a single link field in its node structure that points to its successor. However such 
a list has lesser flexibility and does not aid in an elegant performance of operation such as 
deletion. 


A circularly linked list is an enhancement of the singly linked list representation, in that 
the nodes are circularly linked. This not only provides better flexibility, but also results in 
a better rendering of the delete operation. 


A doubly linked list has one or more data items fields but two links LLINK and RLINK 
pointing to the predecessor and successor of the node respectively. Though the list exhibits 
the advantages of greater flexibility and efficient delete operation, it suffers from the 
drawback of increased storage requirement for the node structure in comparison to other 
linked representations. 

A multiply linked list is a linked representation with one or more data item fields and 
multiple link fields. A multiply linked list in its simplest form may represent a cluster of 
singly linked lists networked together. 

The application of linked lists has been demonstrated on two problems viz., Addition of 
Polynomials and linked representation of a sparse matrix. 





The McGraw-Hill Companies 


Linked Lists 113 


~*~) Illustrative Problems 


Problem 6.1 Write a pseudocode procedure to insert NEW_DATA as the first element in a 
singly linked list T. 
Solution: We shall write a general procedure which will take care of the cases, 
(i) T is initially empty 
(ii) T is non empty 
The logical representation of the list 7 before and after insertion of NEW_DATA, for the two 
cases listed above are shown in Fig. I 6.1. 


Before insertion of After insertion of 
NEW DATA NEW DATA 


T is empty NEW_DATA|- 


AT 
T is non-empty 


[NEW DATAT- 





Fig. | 6.1 
The general procedure in pseudocode: 


procedure INSERT SL FIRST (T, NEW DATA) 
Call GETNODE (X); 


DATA (X) = NEW DATA, 

if (T = NIL) then { LINK (X) = NIL; } 
else {LINK (X = T; 

T = X; 3} 


end INSERT SL FIRST. 


Problem 6.2 Write a pseudocode procedure to insert NEW_DATA as the k element (k> 1) 
in a non empty singly linked list T. 


Solution: The logical representation of the list T before and after insertion of NEW_DATA as 
the Ah element in the list is shown in Fig. I 6.2. 
The pseudocode procedure is: 
procedure INSERT SL K (T, k, NEW DATA) 

Call GETNODE (X); 

DATA (X) = NEW DATA; 

COUNT = 1}; 

TEMP = T; 





The McGraw-Hill Companies 


Data Structures and Algorithms 


114 


E 
Before S l 2 k=l k n 
insertion of E w @ . œ 
NEW DATA 
into T -i 
Aa \ 1 o. | k+1 n+ 1 
insertion of a E — ao — 
NEW DATA 
ato T tL 
into y 

NEW DATA 


Insert node X 
in the 4 position 






Fig. | 6.2 
while (COUNT # k) do 
PREVIOUS PIR = TEMP; /* Remember the address of 
the predecessor node */ 
TEMP = LINK (TEMP); /* TEMP slides down the list */ 
COUNT = COUNT + 1; 
endwhile 
LINK (PREVIOUS PTR) = X; 


LINK (X)= TEMP; 
end INSERT SL K 


Problem 6.3 Write a pseudocode procedure to delete the last element of a singly linked 
list T. 
Solution: The logical representation of list T before and after deletion of the last element is 


shown in Fig. I 6.3. 


r 
List T before ka 


last element 


E 
List T after | 


last element 


Fig. I 6.3 


The pseudocode procedure is 


procedure DELETE LAST (T) 
if (T = NIL) then {call ABANDON DELETE; } 


else 
{ TEMP = PT 
while (LINK(TEMP) # NIL) 





The McGraw-Hill Companies 


Linked Lists 115 
PREVIOUS PTR = TEMP; /*slide down the list in 
search of the last node */ 

TEMP = LINK(TEMP) ; 
endwhile 
LINK (PREVIOUS PTR) = NIL; 


call RETURN (TEMP); 


} 
end DELETE LAST. 


Problem 6.4 Write a pseudocode procedure to count the number of nodes in a circularly 
linked list with a head node, representing a list of positive integers. Store the count of nodes as 
a negative number in the head node. 

Solution: Let T be a circularly linked list with a head node, representing a list of positive 
integers. The logical representation of an example list 7 and the same after execution of the 
pseudo code procedure is shown in Fig. I 6.4. 









HEAD NODE 
k LINK LINK 
we A Tel ellla 
lists: 







DATA DATA 





HEAD NODE 
List 7 LINK LINK 
after 
calculation a| = EJE (a |3 
of length DATA DATA 






Fig. | 6.4 
The pseudocode procedure is: 

procedure LENGTH CLL(T) 

COUNT = 0; 

TEMP = T; 
while (LINK (TEMP) + T) 

TEMP = LINK (TEMP); 

COUNT = COUNT + 1; 
endwhile 

DATA (T) = = COUNT; 


end LENGTH CLL. 


Problem 6.5 For the circular doubly linked list T with a head node shown in Fig. I 6.5 with 
pointers X, Y, Z as illustrated, write a pseudocode instruction to 


The McGraw-Hill Companies 


116 Data Structures and Algorithms 


LLINK RLINK 


Ye 


HEAD NODE NODE 1 NODE 2 NODE 3 NODE 4 NODE 5 





Fig. | 6.5 


(i) Express the DATA field of Node 5 

(ii) Express the DATA field of Node 1 as referenced from head node 7 
(iii) Express the left link of Node 1 as referenced from Node 2 

(iv) Express the right link of Node 4 as referenced from Node 5 
Solution: 

(i) DATA (Z) 

(ii) DATA (RLINK(T)) 
(iii) LLINK(LLINK(X)) 

(iv) RLINK(LLINK(Z )) 


Problem 6.6 Given the following circular doubly linked list Fig. I 6.6(a), fill up the missing 
values in the DATA fields marked “? ” using the clues given. 








DATA 





LLINK RLINK 


Fig. | 6.6(a) 


(i) DATA (B) = DATA (LLINK(RLINK(A)) + DATA (LLINK(RLINK(T))) 
(ii) DATA (LLINK(B)) = DATA (B) + 10 
(iii) DATA (RLINK(RLINK(B)) = DATA (LLINK(LLINK (B))) 


Solution: 
(i) DATA(B) = DATA(A) + DATA(T) 
(© LLINK(RLINK(A)) = A and LLINK(RLINK(T) )= T) 
= 24 + 46 
= 70 
(ii) DATA (LLINK(B)) = DATA(B) + 10 
= 70 + 10 
= 80 





The McGraw Hill Companies 


Linked Lists 117 


(iii) DATA (RLINK (RLINK(B))) = DATA(A) 
= 24 
(© LLINK(LLINK(B)) = A) 
The updated list T is shown in Fig. I 6.6(b). 





Fig. | 6.6(b) 


Problem 6.7 In a programming language (Pascal) the declaration of a node in a singly 
linked list is shown in Fig. I 6.7(a). The list referred to for the problem is shown in Fig. I 6.7(b). 
Given P to be a pointer to a node, the instructions DATA(P) and LINK(P) referring to the DATA 
and LINK fields respectively of the node P, are equivalently represented by P T. DATA and 
P T.LINK in the programming language. 

What do the following commands do to the logical representation of the list T? 


TYPE 
POINTER = Î NODE ; 
NODE = RECORD 
DATA : integer ; 
LINK : POINTER 
END ; 
VAR P, Q R : POINTER 


(a) Declaration of a node in a singly linked list T 


T 


AF Le E 
EJE EAE 


(b) A single linked list 7 


Fig. | 6.7 (a-b) Declaration of a node in a programming language and the logical representation 
of a singly linked list T 


(i) PT.DATA := QÎ.DATA + RÎ.DATA 
(ii) Q:=P 
(iii) RT.LINK : = Q 
(iv) RT.DATA := QT.LINKT.DATA + 10 





The McGraw-Hill Companies 


118 Data Structures and Algorithms 


Solution: ‘The logical representation of list 7 after every command is shown in Fig. I 6.7 (c, d, e, f) 
respectively. 
T 
P Q R 


DESENDE 





(£) 


Fig. | 6.7 (c-f) Logical representation of list T after execution of commands (i) — (iv) of Illustrative 
Problem 6.7 


(i) PT.DATA := QÎ.DATA + RÎ.DATA 
PT.DATA = 57 + 91 = 148 
(ii) Q:=P 
Here Qis reset to point to the node pointed to by P. 
(iii) RT.LINK =Q 
The link field of node R is reset to point to Q. In other words, the list T turns into a circularly 
linked list! 
(iii) RT.DATA := QT.LINKT.DATA + 10 
= 57 + 10 
= 67 


Problem 6.8 Given the logical representations of a list Tand the update in its links, write 
a one-line instruction which will effect the change indicated. The solid lines indicate the existing 
pointers and broken lines the updated links. 


Solution: 
(i) RLINK (RLINK(X)) = NIL 
or 
RLINK(LLINK(T) ) = NIL 
(ii) LINK(LINK(Y)) = T 
(iii) RLINK(T) = RLINK(RLINK(T)) 





The McGraw-Hill Companies 


Linked Lists 119 





» eTit E 
DATA 
`A a Updated links 
—— Existing links 


(@) Review Questions 


The following is a snap shot of a memory which stores a circular doubly linked list 
TENNIS_STARS, that is head node free. Answer the questions 1 to 3, with regard to the list. 


= 





1. The number of data elements in the list TENNIS_STARS is 
(a) 3 (b) 2 (c) 5 (d) 7 


OSO o N HD a Ae Q N 





120 


y 


11. 


12. 


oe ee iS 





The McGraw-Hill Companies 


Data Structures and Algorithms 


The successor of ‘graf’ in the list TENNIS_STARS is 


(a) navaratilova (b) sabatini (c) nirupama (d) chris 
In the list TENNIS STARS, DATA ( RLINK(LLINK(5)) ) = —————- 
(a) mirza (b) graf (c) nirupama (d) chris 


Given the singly linked list 7 shown in Fig. R6.4, the following code inserts the node 
containing the data “where_am_i” 


T NODE | NODE 2 NODE 3 NODE 4 
ce ee Ecce Oe 
DATA LINK n 
Fig. R6.4 

T = LINK (T) 

P = LINK (LINK (T)) 

GETNODE (X) 

DATA (X) = “where am i” 

LINK (X) = P 

LINK (LINK(T)) = X 
(a) between nodes 1 and 2 (b) between nodes 2 and 3 
(c) between nodes 3 and 4 (d) after node 4 


For the singly linked list T shown in Fig. R6.4, after deletion of Node 3, DATA (LINK (LINK 


(a) I (b) AM (c) HERE (d) ALWAYS 
What is the need for linked representations of lists? 
What are the advantages of circular lists over singly linked lists? 
What are the advantages and disadvantages of doubly linked lists over singly linked lists? 
What is the use of a head node in a linked list? 
What are the conditions for testing whether a linked list T is empty, if T is a (i) simple singly 
linked list (ii) headed singly linked list (iii) simple circularly linked list and (iv) headed 
circularly linked list? 
Sketch a multiply linked list representation for the following sparse matrix: 


9 0 0 0 
0 
0 


0 
0 
0 0 


N a © 
ON O 


Demonstrate the application of singly linked lists for the addition of the polynomials P, and 
P, given below: 

P,: 19x® + 78x4 + 6x — 23x2 - 34 

P, : 67x® + 89x° — 23x3 — 75x? — 89x — 21 





The McGraw-Hill Companies 


Linked Lists 121 


(=) Programming Assignments 


© Let X = (x4, Xz -.- X,), Y = Yy Yor Y3 --- Yp) be two lists with a sorted sequence of elements. 
Execute a program to merge the two lists together as a list Z with m + n elements. Implement 
the lists using singly linked list representations. 

. Execute a program which will split a circularly linked list P with n nodes into two circularly 


linked lists P,, P, with the first|n/2| and the last n-|n/2| nodes of the list P in them, 


respectively. 

. Write a menu driven program which will maintain a list of car models, their price, name of 
the manufacturer, engine capacity etc., as a doubly linked list. The menu should make 
provisions for inserting information pertaining to new car models, delete obsolete models, 
update data such as price besides answering queries such as listing all car models within a 
price range specified by the client and listing all details given a car model. 

. Students enrolled for a Diploma course in Computer Science opt for two theory courses, an 
elective course and two laboratory courses from a list of courses offered for the programme. 
Design a multiply linked list with the following node structure: 


ROLLNO | NAME | THEORY] | THEORY2 | LABORATORYI - ELECTIVE 
LABORATORY2 


A student may change his/her elective course within a week of the enrollment. At the end 
of the period, the department takes count of the number of students who have enrolled for 
a specific course in the theory, laboratory and elective options. 
Execute a program to implement the multiply linked list with provisions to insert nodes, to 
update information besides generating reports as needed by the department. 
. [Topological Sorting] The problem of topological sorting is to arrange a set of objects {O,, O», 
.. O,} obeying rules of precedence, into a linear sequence such that whenever O; precedes 
O; we have i<j. The sorting procedure has wide applications in PERT, linguistics, network 
theory, etc. Thus when a project is made up of a group of activities observing precedence 
relations amongst themselves, it is convenient to arrange the activities in a linear sequence 
to effectively execute the project. 
Again, as another example, while designing a glossary for a book it is essential that the 
terms W, are listed in a linear sequence such that no term is used before it has been defined. 
Figure P6.5 illustrates topological sorting. 
A simple way to do topological sorting is to look for objects which are not preceded by any 
other objects and release them into the output linear sequence. Remove these objects and 
continue the same with other objects of the network, until the entire set of objects have been 
released into the linear sequence. However, topological sort fails when the network has a 
cycle. In other words if O; precedes O, and O, precedes O, the procedure is stalled. 
Design and implement an algorithm to perform topological sort of a sequence of objects 
using a linked list data structure. 








The McGraw-Hill Companies 


122 Data Structures and Algorithms 


Network of activities 





: Activity i to be completed 


before activity j 


Topological sorting of activities in the network 


I—+~Q— WD —)—-@ 


Linear Sequence : 1—3—2—5—-4—6—7 


SRO 


Linear Sequence : 3—1—2—5-4—-6-7 
Fig. P6.5 Topological sorting of a network 


CHAPTER 


LINKED STACKS 
AND LINKED 


ONS 


Gu 





7.1 Introduction 


7.2 Operations on 


In Chapters 4 and 5 we discussed a sequential representation of the Linked Stacks 


stack and queue data structures. Stacks and queues were and Linked 
implemented using arrays and hence inherited all the drawbacks of Queues 
the sequential data structure. 7.3 Dynamic 

In this chapter, we discuss the representation of stacks and Memory 
queues using a linked representation viz., singly linked lists. The Management 
inherent merits of the linked representation render an efficient 

and Linked 

implementation of the linked stack and linked queue. Cracks 

We first define a linked representation of the two data structures | 
and discuss the insert / delete operations performed on them. The A Wek ree 
role played by the linked stack in the management of the free yp Bie 
storage pool is detailed. The applications of linked stacks and SA 
queues in the problems of balancing symbols and polynomial 7.5 Applications 
representation respectively, are discussed last. 

Introduction 7.1 


To review, a stack is an ordered list with the restriction that elements are added or deleted from 
only one end of the stack termed top of stack with the ‘inactive’ end known as the bottom of 
stack. A stack observes the last- in- first- out (LIFO) principle and has its insert and delete 
operations referred to as Push and Pop respectively. 

The draw backs of a sequential representation of a stack data structure are 

(i) finite capacity of the stack and 
(ii) checking for STACK FULL condition every time a Push operation is effected. 

A queue on the other hand is a linear list in which all insertions are made at one end of the 
list known as the rear end and all deletions are made at the opposite end known as the front end. 
The queue observes a first- in-first-out (FIFO) principle and the insert and delete operations are 
known as enqueuing and dequeuing respectively. 

The drawbacks of sequential representation of a queue are 

(i) finite capacity of the queue, and 


The McGraw-Hill Companies 


124 Data Structures and Algorithms 


(ii) Checking for the QUEUE_FULL condition before every insert operation is executed, both in 
the case of a liner queue and a circular queue. 
We now discuss linked representations of a stack and a queue. 


Linked stack 


A linked stack is a linear list of elements commonly implemented as a singly linked list whose 
start pointer performs the role of the top pointer of a stack. Let a, b, c be a list of elements. 
Figures 7.1 (a-c) shows the conventional, sequential and linked representations of the stack. 


a b c 





front rear 


(a) Conventional (d) Conventional representation 
representation of stack of a queue 
lleje] | lejeje] | 
Top front rear 
(b) Sequential representation (e) Sequential representation 
of stack of a queue 


rear 


Top front y 
“Ll pol pul T “el pel Pel 


(c) Linked representation (f) Linked representation 
of a stack of a queue 


Fig. 7.1 Stack and queue representation (conventional, sequential and linked) 


Here, the start pointer of the linked list is appropriately renamed TOP to suit the context. 


Linked queues 


A linked queue is also a linear list of elements commonly implemented as a singly linked list but 
with two pointers, viz., FRONT and REAR. The start pointer of the singly linked list plays the 
role of FRONT while the pointer to the last node is set to play the role of REAR. Let a, b, c, be 
a list of three elements to be represented as a linked queue. Figure 7.1(d-f) shows the 
conventional, sequential and linked representation of the queue. 


Operations on Linked Stacks and Linked Queues 7.2 


In this section we discuss the insert and delete operations performed on the linked stack and 
linked queue data structures and present algorithms for the same. 


The McGraw-Hill Companies 


Linked Stacks and Linked Queues 125 





Linked stack operations 


To push an element into the linked stack we insert the node representing the element as the first 
node in the singly linked list. The top pointer which points to the first element in the singly 
linked list is automatically updated to point to the new top element. In the case of a pop 
operation, the node pointed to by the TOP pointer is deleted and TOP is updated to point to the 
next node as the top element. Figures 7.2(a-b) illustrate the push and pop operation on a linked 
stack S. 






m Linked stack S$ ‘ Linked stack S$ 
o 
OER AE 
Push ‘d’ into S 7 Pop ‘c’ from S 7 
Top Top Top 
\ K Top. _ _ \ 
OE DEDE ETH OO GOT IEN 
Linked Stack S after ‘d’ is pushed into it Linked Stack S after ‘c’ is popped from it 
(a) Push operation (b) Pop operation 


Fig. 7.2 Push and pop operation on a linked stack S 


Observe how during the push operation, unlike sequential stack structures, there is no need 
to check for the STACK-FULL condition due to the unlimited capacity of the data structure. 


Linked queue operations 


To insert an element into the queue we insert the node representing the element as the last node 
in the singly linked list for which the REAR pointer is reset to point to the new node as the rear 
element of the queue. To delete an element from the queue, we remove the first node of the list 
for which the FRONT pointer is reset to point to the next node as the front element of the queue. 
Figure 7.3(a-b) illustrates the insert and delete operations on a linked queue. 


Linked queue Q Linked queue Q 
Rear Rear 
Front NY r4 aa S x 
DEDE DEDE 
Insert ‘d’ into Q 7 Delete from Q o 
Front Rear Rear Front Rear 
\ vo rome ALNA 
al pel ple] pil - wak PLL Lely 
(a) Insert operation (b) Delete operation 


Fig. 7.3 Insert and delete operations on the linked queue Q 





The McGraw-Hill Companies 


126 Data Structures and Algorithms 


The insert operation unlike insertion in sequential queues, does not exhibit the need to check 
for QUEUE_FULL condition due to the unlimited capacity of the data structure. The introduction 
of circular queues to annul the drawbacks of the linear queues now appear superfluous, in the 
light of the linked representation of queues. 

Both linked stacks and queues could be represented using a singly linked list with a head 
node. Also they could be represented as a circularly linked list provided the fundamental 
principles of LIFO and FIFO are strictly maintained. 

We now present the algorithms for the operations discussed in linked stacks and linked 
queues. 


Algorithms for push/pop operations on a linked stack 


Let S be a linked stack. Algorithm 7.1 and 7.2 illustrate the push and pop operations to be carried 
out on the stack S. 


Algorithm 7.1: Push item ITEM into a linked stack S with top pointer TOP 


procedure EUS EIEN SCS MNGi) (1 ONE TEEM) 
Is Insert o IITEM Into stack "7 
Call GETNODE (X) 
DATA COTE Ce ae ma © Cle umin@ias al ein. 
LINK (X) = TOP = imMSere node X anlo Stack 7 
TOP /* reset TOP pointer */ 
end EU rei Ack 





Note the absence of the STACK FULL condition. The time complexity of a push operation is 
O(1). 


Algorithm 7.2: Pop from a linked stack S and output the element through ITEM 


procedure FOr: EINK OTAC: (OE js TEM) 
/* pop element from stack and set ITEM to the element */ 
if (TOP = 0) then call ETNE OTA F TEMETY 
tehecc k IL Inked otacok Is emp iy = 77 

else { TEMP = TOP 

ITEM = DATA (TOP) 

MOEM EENE (TOP) 

} 


call RETURN (TEMP) ; 

end POP LINKSTACK. y 
The time complexity of a pop operation is O(1). Example 7.1 illustrates the push and pop 
operation on a linked stack. 


Example 7.1 Consider the stack DEVICE of peripheral devices illustrated in Example 4.1. 
We implement the same as a linked stack. The insertion of PEN, PLOTTER, JOYSTICK and 
PRINTER and a deletion operation are illustrated in Table 7.1. We assume the list to be initially 
empty and TOP to be the top pointer of the stack. 





The McGraw-Hill Companies 


Linked Stacks and Linked Queues 


Table 7.1 /nsert and delete operations on linked stack DEVICE 


Stack 
Operation 


1. Push 
‘PEN’ into 
DEVICE 


2. Push 
‘PLOTTER’ 
into 
DEVICE 


3. Push 
‘JOYSTICK’ 
into DEVICE 


4. Pop from 
DEVICE 


5. Push 
‘PRINTER’ 
into DEVICE 


Algorithms for insert and delete operations in a linked queue 


Stack DEVICE before 


operation 


> Top 


‘JOYSTICK’ - 


FE LOTTER’ 


Algorithm 
invocation 


PUSH_ 
LINKSTACK 
(Top, ‘PEN’) 


PUSH 
LINKSTACK 
(Top, 
‘PLOTTER’) 


PUSH_ 
LINKSTACK 
(Top, 
JOYSTICK’) 


PoP_ 
LINKSTACK 
(Top, ‘ITEM’) 


PUSH __ 
LINKSTACK 
(Top, 
‘PRINTER’) 


Stack DEVICE after 


operation 


‘JOYSTICK |- 


PENT: 


ITEM = ‘JOYSTICK’ 


127 


Set Top to 
point to the 
first node. 


Insert 
PLOTTER as 
the first node 
and reset 
Top. 


Insert 
JOYSTICK as 
the first node 
and reset 
Top. 


Return the 
first node 
and reset 
Top. 


Insert 
PRINTER as 
the first node 
and reset 
Top. 





Let Q be a linked queue. Algorithms 7.3 and 7.4 illustrate the insert and delete operations on the 


queue Q. 


The McGraw-Hill Companies 





128 Data Structures and Algorithms 


Algorithm 7.3: Push item ITEM into a linear queue Q with FRONT and REAR as the 
front and rear pointer to the queue 


procedure INSERT LINKQUEUE (FRONT, REAR, ITEM) 
Call GETNODE (X); 
DATA (X)= ITEM; 


LINK (X)= NIL; /* Node with ITEM is ready to be inserted into Q */ 
if (FRONT = 0) then FRONT = REAR = X; 
/* If Q is empty then ITEM is the first element in the queue 
Oe 
else {LINK(REAR) = X; 
REAR = X 


} df 
end INSERT LINKQUEUE. 


Observe the absence of QUEUE_ FULL condition in the insert procedure. The time complexity 
of an insert operation is O(1). 


Algorithm 7.4: Delete element from the linked queue Q through ITEM with FRONT and 
REAR as the front and rear pointers 


procedure DECETE T CTN OURUE M (ERONTTTEM) 

if (FRONT = 0) then call LINKQUEUE EMPTY; 

/* Test condition to avoid deletion in an empty queue */ 
else {TEMP = FRONT; 


ITEM = DATA (TEMP); 

FRONT = LONK ~(CITEBMP)-; 

} 
call RETURN (TEMP); /* return the node TEMP to the free pool */ 
end DEE i CINKFOUEUE. 





The time complexity of a delete operation is O(1). Example 7.2 Illustrates the insert and delete 
operations on a linked queue. 


Example 7.2 Consider the queue BIRDS illustrated in Example 5.1. The insertion of DOVE, 
PEACOCK, PIGEON and SWAN, and two deletions are shown in Table 7.2. 
Owing to the linked representation there is no limitation on the capacity of the stack or queue. 
In fact, the stack or queue can hold as many elements as the storage memory can accommodate! 
This dispenses with the need to check for STACK_ FULL or QUEUE_FULL conditions during 
push or insert operations respectively. 
The merits of linked stacks and linked queues are therefore 

(i) The conceptual and computational simplicity of the operations 

(ii) non finite capacity 

The only demerit is the requirement of additional space that is needed to accommodate the 

link fields. 





Table 7.2 Insert and delete operations on a linked queue BIRDS 


Linked 
queue 


1. Insert 
‘DOVE’ into 
BIRDS 


2. Insert 
‘PEACOCK’ 
into BIRDS 


3. Insert 
‘PIGEON’ 
into BIRDS 


6. Delete 
from BIRDS 


Linked queue before 


Fron] M Rear 


Algorithm 
Invocation 


INSERT_ 
LINKQUEUE 
(Front, Rear, 
'DOVE’) 


INSERT_ 
LINKQUEUE 
(Front, Rear, 
“PEACOCK’) 


INSERT _ 
LINKQUEUE 
(Front, Rear, 
PIGEON’) 


DELETE... 
LINKQUEUE 
(Front, Rear, 
ITEM) 


INSERT _ 
LINKQUEUE 
(Front, Rear,, 
SWAN) 


DELETE __ 
LINKQUEUE 
(Front, Rear, 


Linked queue after 
operation 


ITEM = ’PEACOCK’ 


The McGraw-Hill Companies 


Linked Stacks and Linked Queues 


129 


Since the 
queue BIRDS 
is empty, 
insert DOVE 
as the first 
node. Front 
and Rear point 
to the node. 


Insert 
PEACOCK as 
the last node. 
Reset Rear 
pointer. 


Insert 
PIGEON as 
the last node. 
Reset Rear 
pointer. 


Delete node 
pointed to 
by Front. 
Reset Front. 


Insert SWAN 
as the last 
node. Reset 
Rear. 


Delete node 
pointed to by 
Front. Reset 
Front. 





The McGraw Hill Companies 


130 Data Structures and Algorithms 





Dynamic Memory Management and Linked Stacks 


Dynamic memory management deals with methods of allocating storage and recycling unused 
space for future use. The automatic recycling of dynamically allocated memory is also known as 
Garbage collection. 

If the memory storage pool is thought of as a repository of nodes, then dynamic memory 
management primarily revolves round the two actions of allocating nodes (for use by the 
application) and liberating nodes (after their release by the application). Several intelligent 
strategies for the efficient allocation and liberation of nodes have been discussed in the literature. 
However, we choose to discuss this topic from the perspective of a linked stack application. 

Every linked representation, which makes use of nodes to accommodate data elements, 
executes procedure GETNODE ( ) to have the desired node allocated to it, from the free storage 
pool and procedure RETURN ( ) to dispose or liberate the node released by it, into the storage 
pool. Free storage pool is also referred to as Available Space (AVAIL SPACE). 

When the application invokes GETNODE ( ), anode from the available space data structure 
is deleted, to be handed over for use by the program and when RETURN ( ) is invoked, the node 
disposed off by the application is inserted into the available space for future use. 

The most commonly used data structure for management of AVAIL SPACE and its insert / 
delete operation is the linked stack. The list of free nodes in AVAIL SPACE are all linked together 
and maintained as a linked stack with a top pointer (AV_SP). When GETNODE ( ) is invoked, a 
pop operation of the linked stack is done releasing a node for use by the application and when 
RETURN ( ) is invoked, a push operation of the linked stack is done. Figure 7.4 illustrates the 
association between the GETNODE ( ) and RETURN ( ) procedures and AVAIL SPACE maintained 
as a linked stack. 

We now implement GETNODE ( ) and RETURN ( ) procedures which in fact are nothing but 
POP and PUSH operations on the linked stack AVAIL SPACE. Algorithms 7.5 and 7.6 illustrates 
the implementation of the procedures. 

It is obvious that at a given instance the adjacent or other nodes in the AVAIL SPACE are 
neighbors that are physically contiguous in the memory but lie scattered in the list. This may 
eventually lead to holes in the memory leading to inefficient use of memory. When variable size 
nodes are in use, it is desirable to compact memory so that all free nodes form a contiguous block 
of memory. Such a thing is termed as memory compaction. 

It now becomes essential that the storage manager, for efficient management of memory, every 
time a node is returned to the free pool, ensures that the neighboring blocks of memory that are 
free are coalesced into a single block of memory, so as to satisfy large requests for memory. This 
is however easier said than done. To look for neighboring nodes which are free, a ‘brute free 
approach’ calls for a complete search through AVAIL SPACE list before collapsing the adjacent 
free nodes into a single block. 

Allocation strategies such as Boundary Tag method and Buddy system method, with efficient 
reservation and liberation of nodes have been proposed in the literature. 





The McGraw-Hill Companies 


Linked Stacks and Linked Queues 131 


Application 





Available space 
(a) Available space before execution of GETNODE ( ) procedure 







AVAIL SPACE! 


Application 


Available space 
(b) Available space after execution of GETNODE ( ) procedure 


Application 


Available space 
(c) Available space before execution of RETURN () procedure 


Push X into 
AVAI L S PACE! 





B 


Application 


Available space 
(d) Available space after execution of RETURN ( ) procedure 


Fig. 7.4 Association between GETNODE ( ), RETURN ( ) procedures and AVAIL _SPACE 





The McGraw-Hill Companies 


132 Data Structures and Algorithms 


Algorithm 7.5: Implementation of procedure GETNODE (X) where AV is the pointer to 
the linked stack implementation of AVAIL_SPACE 


procedure GETNODE (X) 


if (AV TO) then call TNO aRaE NODE; 
[os AVAIL SPACE has no free nodes to allocate * / 
else { X = AV; 
AV = LINK (AV); /* Return the address X of the top node in 


A VAT SPACE 7 


end GETNODE. A 


Algorithm 7.6: Implementation of procedure RETURN (X) where AV is the pointer to the 
linked stack implementation of AVAIL_SPACE 


procedure RETURN (X) 
LINK (X)= AV; /* Push node X Inteo AVAT DEO PACE and) reset AV =/ 


AV = X; 
end RETURN. p 


Implementation of Linked Representations 





It is emphasized here that nodes belonging to the reserved pool, that is nodes which are currently 
in use, coexist with the nodes of the free pool in the same storage area. It is therefore not 
uncommon to have a reserved node having a free node as its physically contiguous neighbor. 
While the link fields of the free nodes, which in its simplest form is a linked stack, keeps track 
of the free nodes in the list, the link fields of the reserved pool similarly keep track of the reserved 
nodes in the list. Figure 7.5 illustrates a simple scheme of reserved pool intertwined with the free 
pool in the memory storage. 


AV START 


S 
ZF, 


















SSS 


We VA 


EAA : Reserved Storage AV: Pointer to AVAIL SPACE 
: Free Storage pool START: Pointer to a linked reserved pool 


of an application 


Fig. 7.5 The scheme of reserved storage pool and free storage pool in the memory storage 


Example 7.3 illustrates the implementation of a linked representation. For simplicity we 
consider a singly linked list occupying the reserved pool. 


The McGraw Hill Companies 


Linked Stacks and Linked Queues 133 


Example 7.3 A snap shot of the memory storage is shown in Fig. 7.6. The reserved pool 
accommodates a singly linked list (START). The free storage pool of used and disposed nodes is 
maintained as a linked stack with top pointer AV. 


DATA Link 
22 AV: 
29 
-14 
36 
144 


116 


O A0 DrAuU A U N — 


43 
ol 34 | o | START: 


Fig. 7.6 A Snapshot of the memory accommodating a singly linked list in its reserved pool and 
the free storage pool 


O 





Note the memory locations AV and START. AV records the address of the first node in the free 
storage pool and START the same of the singly linked list in the reserved pool. The logical 
representation of the singly linked list and the available space are illustrated in Fig. 7.7. 


START- 4 | 9 5 10 
s| z| Pis > >] - 

AVS, 6 2 8 3 7 
=| Peel Hel Hl Hes - 


Fig. 7.7 Logical representation of the singly linked list and the AVAIL_SPACE shown in Fig. 7.6 


Applications 7.5 


All applications of linear queues and linear stacks can be implemented as linked stacks and 
linked queues. In this section we discuss the following problems, 

(i) Balancing symbols, 

(ii) Polynomial representation 
as application of linked stacks and linked queues respectively. 


Balancing symbols 


An important activity performed by compilers is to check for syntax errors in the program code. 
One such error checking mechanism is the balancing of symbols or specifically, balancing of 
parentheses in the context of expressions, which is exclusive to this discussion. 

For the balancing of parentheses, the left parentheses or braces or brackets as allowed by the 
language syntax, must have closing or matching right parentheses, braces or brackets respectively. 
Thus the usage of ( ), or { } or [ ] are correct whereas, (, [, } are incorrect, the former indicative 
of a balanced occurrence and the latter of an imbalanced occurrence in an expression. 





The McGraw-Hill Companies 


134 Data Structures and Algorithms 


The arithmetic expressions shown in Example 7.4 are balanced in parentheses while those 
listed in Example 7.5 are imbalanced forcing the compiler to report errors 


Example 7.4 Balanced arithmetic expressions 
(i) (A+ B)TC-D)+E-F 
(ii) (- (A + B) * (C - D))Î F 


Example 7.5 Imbalanced arithmetic expressions 

(i) (A+ B)*-(C+D+F) 

(ii) (A + B + C) *- (E+ P))) 
The solution to the problem is an easy but elegant use of a stack to check for mismatched 
parentheses. The general pseudocode procedure for the problem is illustrated in Algorithm 7.5. 
Appropriate to the discussion, we choose a linked representation for the stack in the algorithm. 
Examples 7.6 and 7.7 illustrate the working of the algorithm on two expressions with balanced 
and unbalanced symbols respectively. 


Algorithm 7.5: To check for the balancing of parentheses in a string procedure 
BALANCE_ EXPR(E) 


/*E is the expression padded with a S to indicate end of input*/ 
Clear stack; 
while not end of string(£) 
read character; /* read a Character from string E*/7 
if the character iS an open symbol then push character in to _ stack; 
if the character is a close symbol then 
if stack is empty then ERROR 7c >) 
else {pop the stack; 
if the character popped is 
not che matching <ymbol 
then ERROR( ); 
j 
endwhile 


if stack not empty then ERROR(); 
end BALAN hice Re y 


Example 7.6 Consider the arithmetic expression ((A+B)* C) — D which has balanced 
parentheses. Table 7.3 illustrates the working of the algorithm on the expression. 


Example 7.7 Consider the expression ((A+B)* C + G which has unbalanced parentheses. 
Table 7.4 illustrates the working of the algorithm on the expression. 


Polynomial representation 


In Chapter 6, Sec. 6 we had discussed the problem of addition of polynomials as an application 
of linked lists. In this section, we highlight the representation of polynomials as an application 
of linear queues. 





The McGraw-Hill Companies 


Linked Stacks and Linked Queues 135 


Consider a polynomial 9x® — 2x4 + 3x? + 4. Adopting the node structure shown in Fig. 7.8(a) 
(reproduction of Fig. 6.26(a)) the linked list for the polynomial is as shown in Fig. 7.8(b). 


Table 7.3 Working of Algorithm BALANCE_EXPR ( ) on the expression ((A + B)* C) — D 


(A+ B)* C)-D§$ PS Initialization. Note E is padded 
A with $ as end of input symbol 
(A+B)*C)-D$Ş AE Push ‘(’ into S 
A+ B) . C) -D $ STE] ee] Push ( into S 


+B)*C)- D$ EN Ignore character ‘A’ 
B)* C)-D$ a Ignore character ‘+’ 
\)*C) -D$ oe TES Ignore character ‘B’ 


Pop symbol from S. 




















Matching symbol to “)” found. 
Proceed. 


Ignore character * 


Ignore character ‘C’ 


Pop symbol from S. 






Matching symbol to “Y found. 
Proceed. 


Ignore character ‘— ’ 


End of input encountered. 
Stack is empty. 







Success. 





The McGraw-Hill Companies 


Data Structures and Algorithms 


Table 7.4 Working of the algorithm BALANCE _EXPR () on the expression ((A+B) *C7G 


(A+B)*CTG$ a Initialization. 
, E is padded with $ as end of 
input symbol. 
p mreles Oh Push ‘(’ into S 
mee SE Te Push ( into S 
+ 3 *CTGS$ OTE HES Ignore character ‘A’ 


A *CTG$ S Ignore character ‘+ 


)*cTaGs$ oa Ignore character ‘B’ 
+ ( C 
CTOS Pop symbol from S. 
A Matching symbol to “”)” found. 
= Proceed. 


Ignore character * 










Ignore character ‘C’ 


Ignore character eT ° 


Ignore character ‘G’ 


End of input encountered. 
Stack is not empty. 
Error. 


For easy manipulation of the linked list, we represent the polynomial in its decreasing order 
of exponents of the variable (in the case of uni-variable polynomials). It would therefore be easy 
for the function handling the reading of the polynomial to implement the linked list as a linear 
queue, since this would entail an elegant construction of the list from the symbolic representation 
of the polynomial, by enqueuing the linear queue with the next highest exponent term. The linear 
queue representation for the polynomial 9x° — 2x* + 3x* + 4 is shown in Fig. 7.9. 





The McGraw-Hill Companies 


Linked Stacks and Linked Queues 137 


~pe] - 
COEFF | EXP LINK 
COEFF: Coefficient of the term alaj 4 ajoj - 
EXP: Exponent of the variable il. 


(a) Node structure (b) Linked list representation of the 


polynomial 9x° — 2x4 + 3x? + 4 
Fig. 7.8 Linked list representation of a polynomial 





Also, after the manipulation of the polynomials (addition, subtraction etc.) the resulting 
polynomial could also be elegantly represented as a linear queue. This merely calls for 
enqueueing the linear queue with the just manipulated term. Recall the problem of addition of 
polynomials discussed in Sec. 6.6. Maintaining the added polynomial as a linear queue would 
only call for ‘appending’ the added terms (coefficients of terms with like exponents) to the rear 
of the list. However, during the manipulation, the linear queue representation of the polynomials 
are to be treated as traversable queues. A traversable queue while retaining the operations of 
enqueuing and dequeuing, permits traversal of the list in which nodes may be examined. 


Rear 


Front Si! x 
ote] Pell Herel Hell- 


Fig. 7.9 Linear queue representation of the polynomial 9x8 — 2x4 + 3x? + 4 


O Summary 


> Sequential representation of stacks and queues suffer from the limitation of finite capacity 
besides checking for the STACK FULL and QUEUE_ FULL conditions, each time a push 
or insert operation is executed respectively. 

> Linked stacks and linked queues are singly linked list implementation of stacks and 
queues, though a circularly linked list representation can also be attempted without 
hampering the LIFO or FIFO principle of the respective data structures. 

> Linked stacks and linked queues display the merits of conceptual and computational 
simplicity of insert and delete operations besides absence of limited capacity. However, the 
requirement of additional space to accommodate the link fields can be viewed as a demerit. 

> The maintenance of available space list calls for the application of linked stacks. 

> The problems of balancing of symbols and polynomial representation have been discussed 
to demonstrate the application of linked stack and linked queue respectively. 





© Illustrative Problems 


Problem 7.1 Given the following memory snap shot where START and AV_ SP store the 
start pointers of the linked list and the available space respectively, 
(i) Identify the linked list 
(ii) Show how the linked list and the available space list are affected when the following 
operations are carried out: 





The McGraw-Hill Companies 


138 Data Structures and Algorithms 


a. Insert 116 at the end of the list 
b. Delete 243 
c. Obtain the memory snap shot after the execution of operations listed in (a) and (b) 


DATA LINK 
START: AV SP: 


O CON BD OF FP WW NY 





= 
© 


Solution: (i) Since the linked list starts at node whose address is 2, the logical representation of 
the list is as given below: 


STARTS 2 6 8 3 9 10 
a| t| Pisi Peel Pi Pl 


The available space list which functions as a linked stack and starts from node whose address is 


4, is given by: 
4 5 7 l 
“DL e tbe] H 


(a) To insert 116 at the end of the list START, we get a node from the available space list (invoke 
GETNODE ( )). The node released has address 4. The resultant list and the available space 
list are as follows 


s 6 8 3 9 10 4 
“el Hel Hel Heel Hel Hse 
5 7 l 
“bl heul Hel - 


(b) To delete 243, the node holding the element has to be returned to the available space list 
(invoke (RETURN ( ) ). The resultant list and the available space list are as 


2 6 8 9 10 4 
Abel Pel Psl ph Pts] Hel- 
3 5 7 l 
“pel Hel Heel Hul = 


AV SP 


START 


AV_SP 


START 


AV_SP 





The McGraw-Hill Companies 


Linked Stacks and Linked Queues 


139 


(c) The memory snapshot after the execution of (a) and (b) is as given below: 


DATA LINK 


O CON BD OFF FPF W NY 


bea 
© 





START: AV _SP: 


Problem 7.2 Given the following memory snapshot which stores a linked stack L_S and a 
linked queue L_Q beginning at the respective addresses, obtain the resulting memory snapshot 
after the following operations are carried out sequentially. 

(i) Enqueue CONCORDE into L_Q. 

(ii) Pop from L_S 
(iii) Dequeue from L_Q 

(iv) Push PALACE ON_WHEELS into L_S 


DATA LINK 


AMTRACK 





L Q: LS: AV_SP: 


(FRONT) 


LQ 


(REAR) 


The McGraw-Hill Companies 


140 Data Structures and Algorithms 





Solution: It is easier to perform the operations on the logical representations of the lists and 
available space extracted from the memory, before obtaining the final memory snapshot. 
The lists are: 


(REAR) 
D l 7 8 Pá 
(FRONT) AMTRACK E FAST_WIND 4 DEVILS EYE E 
10 3 6 


LS 


~ ~A BLUE MOUNT = BOMBAY MAIL = RAJDHANI E 
AN Pe é a 4 
FALCON = RAJDOOT 4 ICC-246 E 










(i) Enqueue CONCORDE into L_Q yields: 


l 7 8 
f Ta 
(FRONT) AMTRACK = FAST WIND 4 DEVILS EYE zz 


5 4 


RAJDOOT E ICC-246 E ORIENT EXPRESS E 


Here node 2 is popped from AV_SP to accommodate CONCORDE which is inserted at the 
rear of L_Q. 
(ii) Pop from L_S yields: 


AV 


3 6 


L S 
A| BOMBAY MAIL o- RAJDHANI Ez 


AV_SPL 10 5 4 
BLUE MOUNT 4 RAJDOOT | ICC-246 | 4 


9 
ORIENT EXPRESS E 


Here node 10 from L_S is deleted and pushed into AV_SP. 









The McGraw-Hill Companies 


Linked Stacks and Linked Queues 141 


(iii) Dequeue from L_Q yields: 


7 8 2 j” PERR) 


LY so 
(FRONT) FAST_WIND | DEVILS EYE | CONCORDE Ez 
5 
RAJDOOT E 


| 10 
AY AMTRACK | BLUE MOUNT z= 
4 9 
ICC-246 ) ORIENT EXPRESS } 


Here, node 1 from L_Q is deleted and pushed into AV_ SP. 
(iv) Push “PALACE ON_WHEELS” into L_S yields: 


AV SP 





LS | 3 6 


g PALACE _ON_WHEELS| - BOMBAY MAIL = RAJDHANI E 
10 5 
~| BLUE MOUNT = RAJDOOT o- 


AV SP 


4 
ICC-246 | 4 
9 
ORIENT EXPRESS E 


Here, node 1 from AV_SP is popped to accommodate “PALACE_ON_WHEELS” before 
pushing the node into L_S. 
The final lists are 






(REAR) 
7 7 g 2 Ord 
(FRONT) | FAST WIND = DEVILS EYE = CONCORDE | 


Ls l 3 6 


E PALACE ON WHEELS o- BOMBAY MAIL 4 RAJDHANI E 


The memory snapshot is given by: 


DATA LINK 


PALACE ON WHEELS L_Q: L_S: AV_SP: 
concorde [0] RON) 

omama fe) Lo 

C eem |o | Re 





m= Q N e 





The McGraw-Hill Companies 


142 Data Structures and Algorithms 


O aa Jo 
masw |s 


DEVILS EYE 
ORIENT EXPRESS oo 


10 BLUE MOUNT 


O o N O A 





Problem 7.3 Implement an abstract data type STAQUE which is a combination of a linked 
stack and a linked queue. Develop procedures to perform an Insert and a Delete operation, 
termed PUSHINS and POPDEL respectively, on a non empty STAQUE. PUSHINS inserts an 
element at the top or rear of the STAQUE based on an indication given to the procedure and 
POPDEL deletes elements from the top or front of the list. 


Solution: The procedure PUSHINS performs the insertion of an element in the top or rear of the 
list based on whether the STAQUE is viewed as a stack or queue respectively. On the other hand, 
the procedure POPDEL which performs a pop or deletion of element, is common to a STAQUE, 
since in both the cases first element in the list alone is deleted. 


procedure PUSHINS(WHERE, TOP, REAR, ITEM) procedure POPDEL (TOP, ITEM) 


/* WHERE indicates whether the insertion of ITEM TEMP = TOP; 
is to be done as on a stack or as on a queue*/ ITEM = DATA (TEMP); /* delete top 
Call GETNODE (X); element of the list through 
DATA (X) = ITEM; ITEM*/ 
if (WHERE = ’Stack’) then {LINK (X) = TOP; TOP=LINK (TEMP); 
TOP = X: RETURN (TEMP); 
\ end POPDEL. 
else {LINK (REAR) = X; 
LINK (X) = Nil; 
REAR = X; 


} 
end PUSHINS 


Problem 7.4 Write a procedure to convert a linked stack into a linked queue. 
Solution: An elegant and an easy solution to the problem is to undertake the conversion by 
returning the addresses of the first and last nodes of the linked stack as FRONT and REAR 
thereby turning the linked stack into a linked queue. 


procedure CONVERT LINKSTACK (TOP, FRONT, REAR) 


/* FRONT and REAR are the variables which return the addresses of 
the first and last node of the list converting the linked stack into 
a linked queue*/ 





The McGraw-Hill Companies 


Linked Stacks and Linked Queues 143 


if (TOP = Nil) then print (“Conversion not possible”); 
else {FRONT=TOP; 

TEMP=TOP; 

while (LINK (TEMP) procedure + Nil) 

TEMP=LINK (TEMP); 

REAR=TEMP; 

endwhile 

} 
end CONVERT LINKSTACK. 


Problem 7.5 An Abstract Data Type STACKLIST is a list of linked stacks stored according 
to a priority factor viz., A, B, C etc, where A means highest priority, B the next and so on. 


Elements having the same priority are stored as a linked stack. The following is a structure of the 
STACKLIST S 


= Head Node 


---> 
Top (Stack zL Top (Stack C) L 


Priority A Priority B Priority C 
ae c 





Create a STACKLIST for the following application of Process Scheduling with the processes 
having two priorities, viz., R (Real time) and O(Online) listed within brackets. 


1. Initiate Process P4 ( R) 5. Initiate Process P.( O ) 
2. Initiate Process P, ( O ) 6. Initiate Process Ps (R ) 


3. Initiate Process P, ( O ) 7. Terminate Process in Linked Stack O 
4. Terminate Process in Linked Stack R 8. Initiate Process P, ( R) 








The McGraw-Hill Companies 


144 Data Structures and Algorithms 


Solution: The STACKLIST at the end of Schedules 1-3 is shown as follows 


> Priority R Priority O 
os 
Head Node 


Top (Stack R) Top (Stack O) 
= 


The STACKLIST at p end of Schedule 4 is as given below: 


È Priority R Priority O 
Head Node E i 


Top (Stack R) Top (Stack O) eL 
Fal 


The STACKLIST at n end of Schedule 5-8 is as shown below: 
Priority R Priority O 


to Node 


Top (Stack R) a Top (Stack O) -BL 
P| Pal 


Problem 7.6 Write a procedure to reverse a linked stack implemented as a doubly linked 


list, with the original top and bottom positions of the stack reversed as bottom and top respectively. 
For example, a linked stack S and its reversed version S"? are shown below: 


S nN 
dal Ad S d 


AN 


CELHA RET 





The McGraw-Hill Companies 


Linked Stacks and Linked Queues 145 


Solution: An elegant solution would be to merely swap the LLINK and RLINK pointers of each 
of the doubly linked list to reverse the list and remember the address of the last node in the 
original stack S as the TOP pointer. The procedure is given below: 


procedure REVERSE STACK (TOP) 
/* TEMP and HOLD are temporary variables to hold the addresses of nodes*/ 


TEMP=TOP; 
Repeat 
HOLD=LLINK (TEMP) ; 
LLINK (TEMP) =RLINK (TEMP); 
RLINK (TEMP) =HOLD; /* Swap left and right links for each node*/ 
TEMP=LLINK (TEMP) ; /* Move to the next node*/ 
until (TEMP=TOP) 
TOP=RLINK (TEMP) ; 
end REVERSE STACK. 


Problem 7.7 What does the following pseudocode do to the linked queue Q with the 
addresses of nodes, as shown below: 


a, A 
a3 


DEZE SRCE NLE 


procedure WHAT DO I DO(FRONT, REAR) 
/* HAVE, HOLD and HUG are temporary variables to hold the link or data 
fields of the nodes as the case may be*/ 


HAVE=FRONT; 
HOLD=DATA (HAVE) ; 


while LINK(HAVE) + Nil 
HUG= DATA(LINK (HAVE) ) ; 
DATA (LINK (HAVE) ) =HOLD; 
HOLD=HUG; 

HAVE=LINK (HAVE) ; 
endwhile 

DATA (FRONT) =HOLD; 

end WHAT DO I DO 


Solution: The procedure WHAT_DO_I_DO rotates the data items of linked queue Q to obtain 
the resultant list given below: 


nes’ a Fa 
] 


DECE ETE 


Problem 7.8 Write a procedure to remove the nt element (from the top) of a linked stack 
with the rest of the elements unchanged. Contrast this with a sequential stack implementation for 
the same problem (Refer Illustrative Problem 4.2 (iii) of Chapter 4). 





The McGraw-Hill Companies 


146 Data Structures and Algorithms 


Solution: To remove the nth element leaving the other elements unchanged, a linked 
implementation of the stack merely calls for sliding down the list which is easily done, and for 
a reset of a link to remove the node concerned. The procedure is given below. In contrast, a 
sequential implementation as illustrated in Illustrative Problem 4.2(iii), calls for the use of another 
temporary stack to hold the elements popped out from the original stack before pushing them 
back into it. 


procedure REMOVE (TOP, ITEM, n) 

/* The nth element is removed through ITEM*/ 
TEMP=TOP; 

COUNT=1; 

while (COUNT # n) do 

PREVIOUS=TEMP; 
TEMP = LINK 
COUNT=COUNT+1; 
endwhile 

LINK (PREVIOUS) =LINK (TEMP); 
ITEM=DATA (TEMP); 

RETURN (TEMP) ; 

end REMOVE 


(TEMP) ; 


Problem 7.9 Given a linked stack L_S and a linked queue L_Q with equal lengths, what do 
the following procedures to do the lists? Here TOP is the top pointer of L_S and FRONT and 
REAR are the front and rear of L_Q. What are your observations regarding the functionality of 
the two procedures? 


Procedure 
FRONT, REAR) 
/* TEMP, TEMP1, TEMP2 


WHAT IS COOKINGI1 (TOP, procedure 
FRONT, REAR) 


and TEMP3 are /* TEMP, TEMP1, TEMP2 


WHAT IS COOKING2 (TOP, 


and TEMP3 are 


temporary variables*/ 

TEMP1= FRONT; 

TEMP2=TOP; 

while (TEMP14 Nil AND TEMP24 Nil) do 
TEMP3=DATA (FRONT) ; 

DATA (FRONT) =DATA (TOP) ; 


DATA (TOP) =TEMP3; 
TEMP1=LINK (TEMP1) ; 
PREVIOUS=TEMP2; 


TEMP2= LINK(TEMP2) ; 
endwhile 

TEMP=TOP; 

TOP=FRONT; 

FRONT=TEMP; 
REAR=PREVIOUS; 

end WHAT IS COOKING 1 


Solution: 


While WHAT IS COOKING1 does 


temporary variables*/ 

TEMP= TOP; 

while (LINK(TEMP)# Nil) do 
TEMP=LINK (TEMP) ; 

endwhile 

TEMP1=TOP; 

REAR=TEMP,; 

TOP=FRONT; 

FRONT=TEMP1; 

end WHAT IS COOKING2 





Both the procedures swap the contents of the Linked stack L_S and linked queue L_Q. 
by exchanging the data 


items of the lists, 


WHAT IS COOKING2 does it by merely manipulating the pointers and hence is an elegant 


presentation. 





The McGraw-Hill Companies 


Linked Stacks and Linked Queues 147 


Problem 7.10 A Queue List Q is a list of linked queues stored according to orders of priority 
viz., A, B, C, etc., with A accorded the highest priority and so on. The LEAD nodes serve as head 
nodes for each of the priority based queues. Elements with the same priority are stored as a 
normal linked queue. Figure I 7.10 (a-b) illustrate the node structure and an example of Queue 
List respectively. 


LEAD Node Structure Queue Node Structure 


DOWN |LEAD DATA| FOLLOW DATA LINK 


(a) Structure of the nodes in a Queue List 





--—-—-— -Lead Nodes----4 
Q Priority A Priority B Priority C 
to: 
= Head Node 





Front (Queue A) 


ml L 
Rear (Queue C F 


(b) An example of Queue List 


FOLLOW 


---> 
Rear (Queue 4) 





(c) A snap shot of a Queue List 
Fig. | 7.10 





The McGraw-Hill Companies 


148 Data Structures and Algorithms 


The FOLLOW link links together the head nodes of the queues and DOWN link connects it to 
the first node in the respective queue. The LEAD DATA field may be used to store the priority 
factor of the queue. 

Here is a QUEUELIST Q stored in the memory, a snapshot of which is shown in Fig. I 7.10(c) 

There are three queues Q1, Q2, Q3 with priorities 1, 2, and 3. The Head node of QUEUELIST 
stores the number of queues in the list as a negative number. The LEAD DATA field stores the 
priority factor of each of the three queues. START points to the head node of the QUEUELIST and 
AVAILABLE SPACE the pointer to the free storage pool. 

Obtain the QUEUELIST by tracing the lead nodes and nodes of the linked queues. 
Solution: The structure of the QUEUELIST is as shown below: 


12 13 


Head Node 566 


Si S73 
Front (Queue 1) Front (Queue 3) 


Sia 


564 





ai EL 
Rear (Queue 1) ae 
~ 568 
Rear (Queue 3) 


Q) Review Questions 


The following is a snap shot of a memory which stores a linked stack VEGETABLES and a linked 
queue FRUITS beginning at the respective addresses. Answer the following questions with regard 
to operations on the linked stack and queue, each of which is assumed to be independently 
performed on the original linked stack and queue. 


DATA LINK 


CUCUMBER FRUITS VEGETABLES AV_SP 


FRONT TOP 
REAR: [6 | 





1. 


10. 


11. 





The McGraw-Hill Companies 


Linked Stacks and Linked Queues 149 


Inserting PAPAYA into the linked queue FRUITS results in the following changes to the 
FRONT, REAR and AV_SP pointers respectively, as given in: 

(a) 10 2 2 (b) 262 (c) 265 (d) 1025 
Undertaking pop operation on VEGETABLES results in the following changes to the TOP 
and AV_SP pointers respectively, as given in: 

(a) 71 (b) 7 2 (c) 8 2 (d) 8 1 
Undertaking delete operation on FRUITS results in the following changes to the FRONT, 
REAR and AV_SP pointers respectively, as given in : 

(a) 3 6 2 (b) 1036 (c) 3610 (d) 10 3 2 
Pushing TURNIPS into VEGETABLES results in the following changes to the TOP and 
AV_SP pointers respectively, as given in: 


(a) 25 (b) 29 (c0) 15 (d) 19 
After the push operation of TURNIPS into VEGETABLES (undertaken in Review 
Question 7.4), DATA( 2) = — — — — — — and DATA (LINK (2)) = ———— — — 

(a) TURNIPS and CABBAGE (b) CUCUMBER and CABBAGE 

(c) TURNIPS and CUCUMBER (d) CUCUMBER and ORANGE 


What are the merits of linked stacks and queues over their sequential counterparts? 
How is the memory storage pool associated with a linked stack data structure for its 
operations? 
How are push and pop operations implemented on a linked stack? 
What are traversable queues? 
Outline the node structure and a linked queue to represent the polynomial: 
17x°+ 1877 + 9x + 89 
Trace Algorithm 7.5 on the following expression to check whether parentheses are balanced: 
(X+Y+Z)*H)+(D*T))-2 


(=) Programming Assignments 


1. 


Execute a program to implement a linked stack to check for the balancing of the following 
pairs of symbols in a Pascal program. The name of the source Pascal program is the sole 
input to the program. 

Symbols: begin end , ( ), [ J, { }. 

(i) Output errors encountered during mismatch of symbols. 

(ii) Modify the program to set right the errors. 

Evaluate a postfix expression using a linked stack implementation. 

Implement the simulation of a time sharing system discussed in Chapter 5, Sec. 5.5, using 
linked queues. 

Develop a program to implement a Queue List (Refer Illustrative Problem 7.10) which is a 
list of linked queues stored according to an order of priority. 

Test for the insertion and deletion of the following jobs with their priorities listed within 
brackets, on a Queue List JOB_MANAGER with three queues A, B and C listed according 
to their order of priorities: 





The McGraw-Hill Companies 


150 Data Structures and Algorithms 


Insert Job J, (A) 
Insert Job J, (B) 
Insert Job J, (A) 


Insert Job J4 (B) 
Delete Queue B 10. 





5. Develop a program to simulate a calculator which performs the addition, subtraction, 
multiplication and division of polynomials. 





The McGraw-Hill Companies 


CHAPTER 


TREES AND 
BINARY TREES 





8.1 Introduction 


8.2 Trees: Definition 
and Basic 
Terminologies 


Introduction 8.1 





8.3 Representation of 


In Chapters 3-5 we discussed the sequential data structures of Tec 


arrays, stacks and queues. These are termed as linear data 
structures as they are inherently uni-dimensional in structure. In 
other words, the items form a sequence or a linear list. In contrast, 


8.4 Binary Trees: Basic 
Terminologies and 


the data structures of trees and graphs are termed non linear data 
structures as they are inherently two dimensional in structure. Trees 
and their variants, binary trees and graphs, have emerged as truly 
powerful data structures registering immense contribution to the 
develop-ment of efficient algorithms or efficient solutions to 
various problems in science and engineering. 

In this chapter, we first discuss the tree data structure, the basic 


D0) 


8.6 


Sof 


Types 


Representation of 
Binary Trees 


Binary Tree 
Traversals 


Threaded Binary 


terminologies and representation schemes. An important variant of NBs 


the tree viz., binary tree, its basic concepts, representation schemes 
and traversals are elaborately discussed next. A useful modification 
to the binary tree viz., threaded binary tree is introduced. Finally, 
expression trees and its related concepts are discussed as an 
application of binary trees. 


8.8 Applications 





Trees: Definition and Basic Terminologies 


Definition of trees 


A tree is defined as a finite set of one or more nodes such that 
(i) there is a specially designated node called the root and 
(ii) the rest of the nodes could be partitioned into t disjoint sets (t 2 0) each set representing a 
tree T, i = 1, 2,...¢t known as subtree of the tree. 

A node in the definition of the tree represents an item of information, and the links between 
the nodes termed as branches, represent an association between the items of information. Figure 8.1 
illustrates a tree. 

The definition of the tree emphasizes on the aspect of (i) connectedness and (ii) absence of 
closed loops or what are termed cycles. Beginning from the root node, the structure of the tree 





The McGraw-Hill Companies 


152 Data Structures and Algorithms 


Level | 





Fig. 8.1 An example tree 


permits connectivity of the root to every other node in the tree. In general, any node is reachable 
from any where in the tree. Also, with branches providing the links between the nodes, the 
structure ensures that no set of nodes link together to form a closed loop or a cycle. 


Basic terminologies of trees 


There are several basic terminologies associated with the tree. The specially designated node 
called root has already been introduced in the definition. The number of subtrees of a node is 
known as the degree of the node. Nodes that have zero degree are called leaf nodes or terminal 
nodes. The rest of them are called as non terminal nodes. These nodes which hang from branches 
emanating from a node are known a children and the node from which the branches emanate is 
known as the parent node. Children of the same parent node are referred to as siblings. The 
ancestors of a given node are those nodes that occur on the path from the root to the given node. 
The degree of a tree is the maximum degree of the node in the tree. The level of a node is defined 
by letting the root to occupy level 1 (some authors let the root occupy level 0). The rest of the 
nodes occupy various levels depending on their association. Thus if a parent node occupies level 
i, its children should occupy level i+1. This renders the tree to have a hierarchical structure with 
root occupying the top most level of 1. The height or depth of a tree is defined to be the maximum 
level of any node in the tree. Some authors define depth of a node to be the length of the longest 
path from the root node to that node, which yields the relation, 


depth of the tree = height of the tree — 1 


A forest is a set of zero or more disjoint trees. The removal of the root node from a tree results 
in a forest (of its subtrees!). 

In Fig. 8.1, A is the root node. The degree of node E is 2 and L is 0. F, G, H, C, I, J and L are 
leaf or terminal nodes and all the remaining nodes are non leaf or non terminal nodes. Nodes F, 
G and H are children of B and B is a parent node. Nodes J, K and nodes F, G, H are sibling nodes 
with E and B as their respective parents. For the node L, nodes A, E and K are ancestors. The 
degree of the tree is 4 which is the maximum degree reported by node A. While node A which 
is the root node occupies level 1, its children B,C,D and E occupy level 2 and so on. The height 
of the tree is its maximum level which is 4. Removal of A yields a forest of four disjoint (sub) trees 
viz., {B F G H}, {0}, 1D, I} and {E, J, K, L}. 


The McGraw-Hill Companies 


Trees and Binary Trees 153 





Representation of Trees 8.3 


Though trees are better understood in their pictorial forms, a common representation of a tree to 
suit its storage in the memory of a computer, is a list. The tree of Fig. 8.1 could be represented 
in its list form as (A (B(E,G,H), C, D(D, EV,K(L))) ). The root node comes first followed by the list 
of subtrees of the node. This is repeated for each subtree in the tree. This list form of a tree, paves 
way for a naive representation of the tree as a linked list. The node structure of the linked list is 
shown in Fig. 8.2(a). 


(a) General node structure 


DATA LINK |1 LINK 2 PE LINK n 
T 







Th) CALON Leh 





(b) Linked list representation of the tree shown in Fig. 8.1 


Fig. 8.2 Linked list representation of a tree 


The DATA field of the node stores the information content of the tree node. A fixed set of 
LINK fields accommodate the pointers to the child nodes of the given node. In fact the maximum 
number of links the node would require is equal to the degree of the tree. The linked representation of 
the tree shown in Fig. 8.1 is illustrated in Fig. 8.2 (b). Observe the colossal wastage of space by 
way of null pointers! 

An alternative representation would be to use a node structure as shown in Fig. 8.3(a). Here 
TAG =1 indicates that the next field (DATA / DOWN LINK) is occupied by data (DATA) and TAG 
= 0 indicates that the same is used to hold a link (DOWN LINK). The node structure of the linked 
list holds a DOWNLINK whenever it encounters a child node which gives rise to a subtree. Thus 
the root node A has four child nodes, three of which viz., B, D and Æ give rise to subtrees. Note 
the DOWN LINK active fields of the nodes in these cases with TAG set to 0. In contrast, observe 
the linked list node corresponding to C which has no subtree. The DATA field records C with 
TAG set to 1. 


Example 8.1 We illustrate a tree structure in the organic evolution which deals with the 
derivation of new species of plants and animals from the first formed life by descent with 
modification. Figure 8.4 (a) illustrates the tree and Fig. 8.4 (b) shows its linked representation. 





The McGraw-Hill Companies 


154 Data Structures and Algorithms 


DATA / DOWNLINK LINK 


0/1 
(a) General node structure 





AK] (1 fz] = 


(b) Linked representation of the tree of Fig. 8.1 


Fig. 8.3 An alternative elegant linked representation of a tree 

T a 
(1 [Planis imvencbrares| Ffo] | [= 
1| Primitive Chordates! J> o] [ — 
ay Fishes | fo [= 
Fi] Amphibian | +f [= 
G| Reptiles | —P{ 1] Biras | — 

Y 


<> I| Mammals | ->| 1 | Human] — 


(a) Tree structure (b) Linked representation of the tree structure 





Fig. 8.4 Tree structure of organic evolution 


The McGraw-Hill Companies 


Trees and Binary Trees 155 





Binary Trees: Basic Terminologies and Types 


Basic terminologies (A) 


A binary tree has the characteristic of all nodes having 
at most two branches, that is, all nodes have a degree of (B) G 
at most 2. A binary tree can therefore be empty or 


consist of a root node and two disjointed binary trees fp) (E) (F) © 


termed left subtree and right subtree. Figure 8.5 
illustrates a binary tree. Fig. 8.5 An example of binary tree 
It is essential that the distinction between trees and binary trees are brought out clearly. While 
a binary tree can be empty with zero nodes, a tree can never be empty. Again while the ordering 
of the subtrees in a tree is immaterial, in a binary tree the distinction of left and right subtrees 
are very clearly maintained. All other terminologies applicable to trees such as levels, degree, 
height, leaf nodes, parent, child, siblings etc. are also applicable to binary trees. However, there 
are some important observations regarding binary trees. 
(i) The maximum number of nodes on level i of a binary tree is 271, i 21. 
(ii) The maximum number of nodes in a binary tree of height h is 2”— 1, h 2 1. (for proof refer 
Illustrative Problem 6 of Chapter 8) 
(iii) For any non empty binary tree, if to is the number of terminal nodes and t, is the number 
of nodes of degree 2, then t, = t, + 1 (for proof refer Illustrative Problem 8.7) 
These observations could be easily verified on the binary tree shown in Fig. 8.5. The maximum 
number of nodes on level 3 is 29? = 2? = 4. Also with the height of the binary tree being 3, the 
maximum number of nodes = 2° — 1 = 7. Again t, = 4 and t, = 3 which yields t, = t, + 1. 


Types of binary trees 


A binary tree of height h which has all its permissible maximum number of nodes viz., 2" — 1 
intact is known as a full binary tree of height h. Figure 8.6(a) illustrates a full binary tree of height 
4. Note the specific method of numbering the nodes. 

A binary tree with n’ nodes and height h is complete if its nodes correspond to the nodes which 
are numbered 1 to n (n’s n) in a full binary tree of height h. In other words, a complete binary 
tree is one in which its nodes follow a sequential numbering that increments from a left-to-right 
and top-to-bottom fashion. A full binary tree is therefore a special case of a complete binary tree. 
Also, the height of a complete binary tree with n elements has a height h given by h =| log, (n + 1)|. 
A complete binary tree obeys the following properties with regard to its node numbering: 

(i) If a parent node has a number i then its left child has the number 2i (27 < n). If 2i > n then 
i has no left child. 
(ii) If a parent node has a number i, then its right child has the number 27 + 1 (2i + 1 < n). If 
2i + 1 > n then i has no right child. 
(iii) If a child node (left or right) has a number 7 then the parent node has the number 
Li/2] if i #1. If i = 1 then i is the root and hence has no parent. 

In the full binary tree of height 4 illustrated in Fig. 8.6(a), observe how the parent-child 
numbering is satisfied. For example, consider node s (number 4), its left child w has the number 
2*4=8 and its right child has the number 2*4+1=9. Again the parent of node v (number 7) is the 
node with number |7/2] =3 (i.e.) node 3 which is r. 


The McGraw-Hill Companies 


156 Data Structures and Algorithms 





(a) Left skewed Right skewed ü 


(c) Skewed binary tree 
Fig. 8.6 Examples of full binary tree, complete binary tree and skewed binary trees 
Figure 8.6(b) illustrates an example complete binary tree. A binary tree which is dominated 
solely by left child nodes or right child nodes is called a skewed binary tree or more specifically 


left skewed binary tree or right skewed binary tree respectively. Figure 8.6(c) illustrates examples 
of skewed binary trees. 


Representation of Binary Trees 





A binary tree could be represented using a sequential data structure (arrays) as well as linked 
data structure. 


The McGraw-Hill Companies 





Trees and Binary Trees 157 


Array representation of binary trees 


To represent the binary tree as an array, the sequential numbering system emphasized by a 
complete binary tree comes in handy. Consider the binary tree shown in Fig. 8.7(a). The array 
representation is as shown in Fig. 8.7(b). The association of numbers pertaining to parent and left/ 
right child nodes makes it convenient to access the appropriate cells of the array. However, the 
missing nodes in the binary tree and hence the corresponding array locations, are left empty in 
the array. This obviously leads to a lot of wastage of space. However, the array representation 
ideally suits a full binary tree due to its non wastage of space. 


123 4 5 6 7 8 9 10 1 





10 11 
, (b) Array representation of the 
(a) A binary tree binary tree 


Fig. 8.7 Array representation of a binary tree 


Linked representation of binary trees 


The linked representation of a binary tree has the node structure shown in Fig. 8.8(a). Here, the 
node, besides the DATA field, needs two pointers LCHILD and RCHILD to point to the left and 
right child nodes respectively. The tree is accessed by remembering the pointer to the root node 
of the tree. 





LCHILD DATA RCHILD ele 
= 
ARN 
YK y X y 


(a) Structure of the node (b) Linked representation of the binary tree of 
Fig. 8.7(a) 


Fig. 8.8 Linked representation of a binary tree 


The McGraw Hill Companies 


158 Data Structures and Algorithms 


In the binary tree T shown in Fig. 8.8(b), LCHILD (T) refers to the node storing b and RCHILD 
(LCHILD (7)) refers to the node storing d and so on. The following are some of the important 
observations regarding the linked representation of a binary tree: 

(i) If a binary tree has n nodes then the number of pointers used in its linked representation 

is 2 * n. 
(ii) The number of null pointers used in the linked representation of a binary tree with n nodes 
isn +1. 
However, in a linked representation it is difficult to determine a parent given a child node. In any 
case if an application so requires, a fourth field PARENT may also be included in the structure. 





Binary Tree Traversals 8.6 


An important operation that is performed on a binary tree is its traversal. A traversal of a binary 
tree is where its nodes are visited in a particular but repetitive order, rendering a linear order of 
the nodes or information represented by them. 

A traversal is governed by three actions, viz. Move left (L), Move Right (R) and Process Node (P). 
In all, it yields six different combinations of LPR, LRP, PLR, PRL and RLP. Of these, three have 
emerged significant in computer science. They are, 


LPR — Inorder traversal 
LRP — Postorder traversal 
PLR — Preorder traversal. 


The algorithms for each of the traversals are elaborated here. 


Inorder Traversal 


The traversal keeps moving left in the binary tree until one can move no further, processes the 
node and moves to the right to continue its traversal again. In the absence of any node to the 
right, it retracts backwards by a node and continues the traversal. 

Algorithm 8.1 illustrates a recursive procedure to perform inorder traversal of a binary tree. 
For clarity of application, the action Process Node (P) is interpreted as Print node. Observe the 
recursive procedure reflect the maxim LPR repetitively. Example 8.2 illustrates the inorder 
traversal of the binary tree shown in Fig. 8.9. 


annid 


$ ar 
A G Sik, fam 


Fig. 8.9 Binary tree to demonstrate Inorder, Postorder and Preorder traversals 





The McGraw Hill Companies 


Trees and Binary Trees 159 


Example 8.2 An easy method to obtain the traversal would be to run one’s fingers on the 
binary tree with the maxim: move left until no more nodes, process node, then move right and continue 
the traversal. 

An alternative method is to trace the recursive steps of the algorithm using the following 
scheme: 


Algorithm 8.1: Recursive procedure to perform Inorder traversal of a binary tree 


procedure INORDER TRAVERSAL (NODE) 
/* NODE refers to the Root node of the binary tree 
im 26s “firse scCall” “eo ~The procedure. ROOL mode as) che 
starting point of the traversal */ 
If NODE + NIL then 
{ call TNOE DER RERAVER ATARE En (NODEDE 
/ * Inorder traverse the left subtree (L) */ 
print (DATA (NODE)) ; 
A Process node P 
call TNORDERTERAVERSALCANRC HATLO (NODE: iy 
/* Inorder traverse the right subtree (R)*/ 


} 
end INORDER TRAVERSAL. y 


Execute the traversal of the binary tree as traverse left subtree, process root node and traverse right 
subtree. Repeat the same for each of the left and right subtrees encountered. Table 8.1. illustrates 
the traversal of the binary tree shown in Fig. 8.9, using this scheme. Each open box in the inorder 
traversal output (Column 2 of Table 8.1) represents the output of the call to the procedure 
INORDER_TRAVERSAL with the root of the appropriate subtree as its input. The final output of 
the inorder traversalis STQPWURV. 


Table 8.1 /norder traversal of binary tree shown in Fig. 8.9 


Step | 


Inorder Traverse Binary Tree 
Node 


Inorder 
traversals of the 
Left and Right 
subtrees of the 
root node are to 


| | yield their 


output. 


‘y Process Root 


as 


Inorder Inorder 

traverse traverse 
Left Right 

subtree 1 subtree 1 


Left subtree 1 Right subtree 1 





(Contd.) 


The McGraw-Hill Companies 


160 Data Structures and Algorithms 





(Contd.) 


Inorder traversal 
of the Left 


Inorder Traverse Left subtree 1 
subtree 1 yields 
Inorder traversal 


JON of Left subtree 2, 


] ] process root Q, 
Q Right subtree and Inorder 


empty Inorder Inorder traverse Right 


subtree. 
traverse traverse However, since 
Left Right 


the Right subtree 
subtree subtree is empty, its 
2 1 traversal yields 
NIL output. 


Step 2 


ProcessRoot 


© 
E- 
E 


Left subtree 2 


Step 3 
Inorder Traverse Left subtree 2 


Process Root 


\ 


Left 
subtree Right subtree 2 


empty 


Inorder Inorder 
traverse traverse 
Right Right 
subtree subtree 
2 1 


Step 4 NIL NIL NIL Inorder traversal 
of Left subtree 1 


Inorder Traverse Right subtree 2 | je ie | es ? 
i S | Q | | is done. 
A j Gathering the 
X = 


traversal’s output 
Left Right for Left subtree 1 


subtree subtree Intermediary Output: yields STQ. 


The Inorder 
Traversal of 
Right subtreel 
needs to be 
performed. 


empty empty 


Inorder traverse 
Right subtree 1 





(Contd.) 


The McGraw-Hill Companies 





Trees and Binary Trees 161 


(Contd.) 


Step 5 Inorder traverse Inorder Traversal 


Inorder Traverse Right subtree 1 Left subtree 3 of the Left 
subtree 3 and 


E ? ? Right subtree 3 
| STO | (Lo ® L_) lel] are to yield their 
Cu) QY a output. 
Inorder traverse 
CW) Right subtree 3 


Left subtree 3 Right subtree 
empty 


Step 6 : NIL 
Inorder Traverse Left subtree 3 [STO | P|| L U IRL] | 


Inorder Inorder 


traverse traverse 
si Left Right 
Right subtree 4 subtree 3 


subtree empty 
Left subtree 4 


Step 7 


Inorder Traverse Left subtree 4 


= 
Z 


O ol É oA 


Inorder 
traverse 
Left subtree Right subtree Right 
empty empty subtree 3 


Step 8 Inorder 
Inorder Traverse Right subtree 3 traversal of 


On Right subtree 1 

X a | STO | P is done. 

Left subtree Right subtree Sani: the 
empty empty traversal’s 


output yields 


Final Output: WURV. 


STO Final output: 
| | STQPWURV 








The McGraw-Hill Companies 


162 Data Structures and Algorithms 


Postorder Traversal 


The traversal proceeds by keeping to the left until it is no further possible, turns right to begin 
again or if there is no node to the right, processes the node and retraces its direction by one node 
to continue its traversal. 

Algorithm 8.2 illustrates a recursive procedure to perform post order traversal of a binary tree. 
The recursive procedure reflects the maxim LRP invoked repetitively. Example 8.3 illustrates the 
postorder traversal of the binary tree shown in Fig. 8.9. The traversal output is 7ZSQWUVRP. 


Algorithm 8.2: Recursive procedure to perform Postorder traversal of a binary tree 


procedure POSTORDER TRAVERSAL (NODE) 
[oe NODE refers to the Root node of the binary tree 
In its Tirst call [0O the procedure.. ROO node 1s Che 
starting Omit or Che travera l 77 
If NODE # NIL then 
{ call POS TORDER RAVE ROA CECH Tih (NODE) ) 7 
/ * Postorder traverse the left subtree (L) */ 
call FO. TORDER TRAVER ck aCe Bilin NOmE iy, 
/* Postorder traverse the right subtree (R)*/ 
print (DATA (NODE)) ; 
[= PROCESS MOG me) vs 7 


} 
end POSTORDER TRAVERSAL. y 


Example 8.3 As pointed out in Example 8.2, an easy method would be to run one’s fingers 
on the binary tree with the maxim: move left until there are no more nodes and turn right to continue 
traversal. If there is no right node, process node, retract by one node and continue traversal. 

An alternative method would be to trace the recursive steps of the algorithm using the scheme: 
Traverse left subtree, Traverse right subtree and Process root node. Table 8.2 illustrates the traversal of 
the binary tree shown in Fig. 8.9 using this scheme. 


Preorder Traversal 


The traversal processes every node as it moves left until it can move no further. Now it turns right 
to begin again or if there is no node in the right, retracts until it can move right to continue its 
traversal. 

The recursive procedure for the preorder traversal is illustrated in Algorithm 8.3. The recursive 
procedure reflects the maxim PLR invoked repetitively. Example 8.4 illustrates the preorder 
traversal of the binary tree shown in Fig. 8.9. The traversal output is POSTRUWYV. 


Example 8.4 An easy method as discussed before would be to trace the traversal on the 
binary tree using the maxim: Process nodes while moving left until no more nodes, turn right, and 
otherwise retract to continue the traversal. 

An alternative method is to trace the recursive steps of the algorithm using the following 
scheme: 


The McGraw-Hill Companies 





Trees and Binary Trees 163 


Table 8.2 Postorder traversal of binary tree shown in Fig. 8.9 


Step 1 Postorder 
Postorder traverse Binary Tree Traversal of the 
Node Left and Right 
subtrees of the 

root is yet to 


2 " yield their 
D R) | l | l | output. 
8) UY Y Postorder Postorder 
traverse traverse 
B D Left Right 
subtree 1 subtree 1 


Left Right 
subtree 1 subtree 1 


~ Process Root 


Step 2 Postorder traverse Left subtree 2 
Postorder Traverse Left subtree 1 


a) L2 IIN ©) ? |P 
= Postorder traverse 
t? Right subtree Right subtree 1 
empty 


Left subtree 2 


Step 3 I NIL 2 
Postorder Traverse Left subtree 2 PL? 5 | | | ? | P 


Postorder traverse Postorder traverse 
Right subtree 2 Right subtree 1 
\ 


Left 
subtree Right subtree 2 


empty 


Postorder traverse Right NI el | NIL traversal of Left 
subtree 2 I | JIL * | S | | J£ Alle subtree 1 is 
done. 
pon = 


Left subtree Right subtree Gathering the 


empty empty traversal’s 
output for Left 
Intermediary fso subtree 1 yields 


2 
Output; | a—— 2 TSO 


Step 4 NIL Postorder 
T 





(Contd.) 


The McGraw-Hill Companies 





164 


(Contd.) 


Step 5 
Postorder traverse Right 
subtree 1 


D 


Left subtree Right subtree 
3 3 


Postorder Traverse Left 
subtree 3 


a 


Right subtree 
Left subtree 4 empty 


Step 7 
Postorder traverse left subtree 4 


JX 


Left subtree Right subtree 
empty empty 


Step 8 
Postorder Traverse Right subtree 3 


ZON 


Left subtree Right subtree 
empty empty 


ITSO || LEINU uj IR] p 


Data Structures and Algorithms 


Postorder traverse 
Right subtree 3 


irsol N Kj, 


Postorder traverse 
Left subtree 3 


| TSO | LL NYY LIR] p 
Postorder 


Traverse Left 
subtree 4 


Postorder 
Traverse Right 
subtree 3 


NIY NIL W 


H 
s 3 
g E 
E | 
aa 


U 
so) a LIR] P 


Postorder traverse 
Right subtree 3 


Postorder 
traversal of Right 
subtree 1 is done. 
Gathering the 
output yields 
WUVR 


W 
NUNY © INIUINIY y 


` 


3 € 


Final output: 
TSQWUVRP 





The McGraw-Hill Companies 


Trees and Binary Trees 165 


Algorithm 8.3: Recursive procedure to perform Preorder traversal of a binary tree 


procedure PERRORDERTERAVERGATAR NODE) 
/* NODE refers to the Root node of the binary tree 
in its first call to the procedure. Root node is the 
Start NG PoE Or mene traverse ls 7/7 
If NODE + NIL then 
{ print (DATA (NODE)) ; 
SP LOecess a (2) 
call PR ERORDERTERAVER ATRAE rom nba N@ IDEs i, 
/ * Preorder traverse the left subtree (LL) */ 
call PIR OUIDIEIN Eve imievSveIh (INC ISNEIE ID (NONBIz 
/* Preorder traverse the right subtree (R)*/ 


} 
end PREORDER TRAVERSAL. 


Execute the traversal of the binary tree as, process root node, traverse left subtree and traverse right 
subtree, repeating the same for each of the left and right subtrees encountered. 

Table 8.3 illustrates the preorder traversal of the binary tree shown in Fig. 8.9 using this 
scheme. Some significant observations pertaining to the traversals of a binary tree are the following 


Table 8.3 Preorder traversal of binary tree shown in Fig. 8.9 


Step 1 


Preorder traversal of Binary Tree 


Node 


Process Root 


Preorder 


traversals of the 
Preorder Preorder Left subtree 1 


traverse traverse and Right 
Left Right subtree 1 to 
subtree 1 subtree 1 yield their 


output. 
Left subtree | Right subtree | P 


Step 2 


Preorder traverse Left subtree 1 Preorder traverse 
Left subtree 2 


> 


POL? t : 


Right subtree Preorder traverse 


(7) empty Right subtree 1 


Left subtree 2 





(Contd.) 


The McGraw Hill Companies 


166 


(Contd.) 


Step 3 
Preorder traverse 


Left subtree 2 


xX 
Left subtree 
ar Right subtree 2 


Step 4 


Preorder traverse Right subtree 2 


X = 
Left subtree Right subtree 
empty empty 


Step 5 


Preorder traverse Right subtree 1 


Left subtree3 Right subtree 3 


Step 6 


Preorder traverse Left subtree 3 


Right subtree 
Left subtree 4 empy 


Step 7 


Preorder traverse left subtree 4 


JX 


Left subtree Right subtree 
empty empty 


Intermediary Output: 


Data Structures and Algorithms 


Preorder traverse 
Right subtree 2 


p | QLS NUL IINU? | 


Preorder traverse 
Right subtree 1 


T NIL, ,NIL Preorder 
oe | i traversal of Left 


S 47) NIL subtree 1 is 
5 ee eee done. Gathering 
the traversal’s 
output yields 
OST ; OST 


Preorder traverse 
Left subtree 3 


P | OST || * (7? Ka 


Preorder traverse 
Right subtree 3 


Preorder traverse 
Left subtree 4 


P osr ROLLIN 21) 


Preorder traverse 
Right subtree 3 


y NILNIL 
LItt 
Ui NI 

P| osr N 


Preorder traverse 
Right subtree 3 





(Contd.) 





The McGraw-Hill Companies 


Trees and Binary Trees 167 


(Contd.) 


p NIL NIL NIL NIL Preorder traversal 
Ea a 1 i oy of Right subtree 


Preorder traverse Right subtree 3 1 is done. 


NS a 


traversal’s output 


Left subtree Right subtree yields RUWV. 
empty empty Final Output: 


Final output: 


AE RUWV POSTRUWV 
ea ee 





(i) Given a preorder traversal of a binary tree, the root node is the first occurring item in the 
list. 
(ii) Given a postorder traversal of a binary tree, the root node is the last occurring item in the 
list. 
(iii) Inorder traversal does not directly reveal the root node of the binary tree. 
(iv) An inorder traversal coupled with any one of preorder or post order traversal helps trace 
back the structure of the binary tree (Refer Illustrative Problems 8.3, 8.4.). 


Threaded Binary Trees 8.7 


The linked representation of the binary tree discussed in Section 8.5 showed that for a binary tree 
with n nodes, 2n pointers are required of which (n+1) are null pointers. A.J. Perlis and C.Thornton 
devised a prudent method to utilize these (n+1) empty pointers, introducing what are called 
threads. Threads are also links or pointers but replace null pointers by pointing to some useful 
information in the binary tree. Thus, for a node NODE if RCHILD(NODE) is NIL then the null 
pointer is replaced by a thread which points to the node which would occur after NODE when 
the binary tree is traversed in inorder. Again if LCHILD (NODE) is NIL then the null pointer is 
replaced by a thread to the node which would immediately precede NODE when the binary tree 
is traversed in inorder. 

Figure 8.10. illustrates a threaded binary tree. The threads are indicated using broken lines to 
distinguish them from the normal links indicated with solid lines. The inorder traversal of the 
binary tree is also shown in the figure. 

Note that the left child of G and the right child of E have threads which are left dangling due 
to the absence of an inorder predecessor and successor respectively. 


Inorder Traversal: GF BA DCE 





Fig. 8.10 A threaded binary tree 





The McGraw-Hill Companies 


168 Data Structures and Algorithms 


There are many ways to thread a binary tree T, corresponding to the specific traversal chosen. 
In this work, we have the threading correspond to an Inorder traversal. Also the threading can 
be of two representations viz., one-way threading and two-way threading. One-way threading is 
where a thread appears only on the RCHILD field of a node, when it is null, pointing to the 
inorder successor of the node (Refer Illustrative Problem 18.10). On the other hand, in two-way 
threading, which had been introduced above, a thread appears in the LCHILD field also, if it is 
null, which points to the inorder predecessor of the node. However, the first and the last of the 
nodes in the inorder traversal will carry dangling threads. 


Linked representation of a threaded binary tree 


A linked representation of the threaded binary tree (two-way threading) has a node structure as 
shown in Fig. 8.11. 





LEFT LCHILD DATA RCHILD RIGHT 
THREAD TAG THREAD TAG 


(True or False) (True or False) 


Fig. 8.11 Node Structure of a linked representation of a threaded binary tree 


Since the LCHILD and RCHILD fields are utilized to represent both links and threads it becomes 
essential for the node structure to clearly distinguish between them to avoid confusion while 
processing the threaded binary tree. Hence it is necessary that the node structure includes two 
more fields which act as flags to indicate if the LCHILD and RCHILD fields represent a thread 
or a link. 

If the LEFT THREAD TAG or RIGHT THREAD TAG is marked true then LCHILD and 
RCHILD fields represent threads otherwise they represent links Also, to tuck in the dangling 
threads which are bound to arise, the linked representation of a threaded binary tree includes a 
head node. The dangling threads point to the head node. The head node by convention has its 
LCHILD pointing to the root node of the threaded binary tree and therefore has its LEFT 
THREAD TAG set to false. THE RIGHT THREAD TAG field is also set to false but the RCHILD 
link points to the head node itself. Figure 8.12(a) shows the linked representation of an empty 
threaded binary tree and Fig. 8.12(b) that of a non-empty threaded binary tree. 


Growing threaded binary trees 


Here we discuss the insertion of nodes contributing to the growth of threaded binary trees. The 
insertion of a node calls not only for the realignment of links but also of the threads involved. 

Consider the case of inserting a node NEW to the right of anode NODE in the threaded binary 
tree. If the node NODE had no right subtree then the case is trivial. Attach NEW as right child 
of NODE and appropriately reset the threads (LCHILD and RCHILD) of NEW to point to its 
inorder predecessor and successor respectively. Figure 8.13(a) illustrates this insertion. 

In the next case, if NODE already had a right subtree then attach NEW as the right child of 
the node NODE and link the previous right subtree of NODE to the right of node NEW. When 
this is done, the threads of the appropriate nodes are reset as shown in Fig. 8.13(b). 

A similar procedure is followed to insert a node in the left subtree of a threaded binary tree. 


The McGraw-Hill Companies 


Trees and Binary Trees 169 


F 


LCHILD 
DATA  RCHILD 





LEFT THREAD TAG RIGHT THREAD TAG 
(a) Empty threaded binary tree 


Head node 





(b) Non-empty threaded binary tree 


Fig. 8.12 Linked representation of threaded binary trees 


Application 8.8 





In this section we discuss an application of binary trees in expression trees which have a 
significant role to play in the principles of compiler design. 


Expression trees 


Expressions—arithmetic and logical—are an inherent component of programming languages. The 
following are examples of arithmetic and logical expressions. 

e (A+B)*C-D)TG (arithmetic expression) 

e AAA B)V(BAE)AAF (logical expression) 

e (T<W)v(A SB)A(C #E) (logical expression) 


The McGraw-Hill Companies 





170 Data Structures and Algorithms 


Insert NEW 





(b) Insertion of node NEW to the right of NODE: Right subtree of NODE is non-empty 


Fig. 8.13 /nsertion of a node in the right subtree of a threaded binary tree 


That expressions are represented in three forms viz., infix, postfix and prefix were detailed in 
Sec. 4.3. To quickly review, an infix expression which is the commonly used representation of an 
expression follows the scheme <operand> <operator> <operand>. 

Examples are A + B, A * B. 

Post fix expressions follow the scheme < operand > < operand > < operator >. 
Examples are AB+, AB*. 

Prefix expressions follow the scheme < operator > < operand > < operand >. 
Examples are +AB, *AB . 

Binary trees have found an application in the representation of expressions. An expression tree 
has the operands of the expression as its terminal or leaf nodes and the operators as its non 
terminal nodes. The arity of the operator is therefore restricted to be 1 or 2 (unary or binary) and 
this is what is commonly encountered in arithmetic and logical expressions. Figure 8.14 illustrates 
example expression trees. 

The hierarchical precedence and associativity rules of the operators in terms of the expressions 
are reflected in the orientation of the subtrees or the sibling nodes. Table 8.4 illustrates some 
examples showing the orientation of the binary tree in accordance with the precedence and 
associatively rules of the operators in the expression terms. 





The McGraw-Hill Companies 


Trees and Binary Trees 171 


Expression: (P * Q)+CÎ -D Expression: (xy) V (mp ^z) 





(a) Arithmetic (b) Logical 
Fig. 8.14 Example of expression trees 


Table 8.4 Orientation of the binary trees with regard to expressions 


| Expression Expression Tree Remarks 


=... + B Observe the orientation of the sibling 
nodes. The left operand A and the right 
operand B become the left and 
ol E right child nodes of the operator + 
respectively. 


A+B+C The left associativity rule satisfied 
‘i by + is reflected in the orientation 
(+) rc) of the subtrees. 


ATBTC Ka The right associate rule satisfied by Î is 
reflected in the orientation of the 
(ay Q subtrees 
A*B-C/D The hierarchical precedence relation 
a) among the operators decides the 
G ra orientation of the subtrees. 








The McGraw-Hill Companies 


172 Data Structures and Algorithms 


Traversals of an expression tree 


Section 8.6. detailed the traversals of a binary tree. With an expression tree essentially being a 
binary tree, the traversal of an expression tree also yields significant results. Thus, the inorder 
traversal of an expression tree yields an infix expression, the postorder traversal, a postfix 
expression and preorder traversal, a prefix expression. The output of the algorithms 
INORDER TRAVERSAL(), PREORDER TRAVERSAL( ) and POSTORDER TRAVERSAL ( ) on any 
given expression tree can be verified against the hand computed infix, prefix and postfix 
expressions (discussed in Sec. 4.3). 


Conversion of infix expression to postfix expression 


We utilize this opportunity to introduce a significant concept of infix to postfix expression 
conversion which finds a useful place in the theory of compiler design. 

Given an infix expression, for example A+B*C, the objective is to obtain its postfix equivalent 
ABC*+. In Chapter 4, Sec. 4.3, a hand coded method of conversion was illustrated. In this section, 
we introduce an algorithm to perform the same. 

The algorithm makes use of a stack as its work space and is governed by two priority factors 
viz., In Stack priority (ISP) and Incoming Priority (ICP) of the operators participating in the 
conversion. Thus those operators already pushed into the stack during the process of conversion, 
command an ISP in relation to those which are just about to be pushed in to the stack (ICP). 
Table 8.5 illustrates the ISP and ICP of a common list of arithmetic operators. 


Table 8.5 /SP and ICP of a common list of arithmetic operators 


) 





The rule which operates during conversion of infix to post fix expression is : pop operators out 
of the work stack so long as the ICP of the incoming operator is less than or equal to the ISP of the operators 
already available in the stack. 

The input infix expression is padded with a “$” to signal end of input. The bottom of the work 
stack is also signaled with a “$ “ symbol with ISP($)= —-1. Algorithm 8.4 illustrates the pseudo- 
code procedure to convert an infix expression into postfix expression. 


Algorithm 8.4: Procedure to convert infix expression to postfix expression 


procedure ENET T POS ERT CONVE) 
e Co Convert an Inis expression E padded wilh a “Ss 
as Its end- ol Input Symbol InCo ilS egulvalent 
Posl Tis (Expression *~/ 
X : = getnextchar (E); 
i> MODE n Che Nex Chara Cer Tron Leh 








The McGraw-Hill Companies 


Trees and Binary Trees 173 


while x # “S ™ do 
case x of 
x is an operand: print x; 
x= > )’:while (top element of stack #(") do 
print top element of stack and pop “Stack; 
end while; 
poo `( ` from stacks 
else : while ICP (x) < ISP(top element of stack) do 
print too element of Stack and pop Stack; 
end while 
push x rinto Stack, 
end case 
x: = getnextchar (E); 
end while 
while stack is non empty do 
print top element of stack and pop stack, 


end while 
end INFIX POSTFIX CONV. 


Example 8.5 illustrates the conversion of an infix expression into its equivalent postfix 
expression using Algorithm INFIX POSTFIX _CONV ( ). 


Example 8.5 Consider an infix expression A*(B+C)-G. Table 8.6 illustrates the conversion 
into its postfix equivalent. 


Table 8.6 Conversion of A* (B+C) —G$ into its postfix form 

Input character Work stack Postfix expression Remarks 
fetched by 

getnextchar ( ) 


ee 
Since ICP ( (es) tol. © push ( into 
stack. 

















Since ISP(*) > ISP ($) push * into stack. 
Print B. 


Since ICP (+)> ISP(‘(‘) push + into stack. | 
rd 


ABC+ Pop elements from stack until ‘(’ is : 
reached. Also pop ‘(’ from stack. 

ABC+* Since ICP (-)< ISP (*) pop * from the 
stack. Push ‘ —‘ into stack 


aces rine SSCSC~*d 


ABC+*G- End of input ($) reached. Empty 
contents of stack. 













The McGraw-Hill Companies 


174 Data Structures and Algorithms 





ADT for Binary Trees 


Data objects: 


A binary tree of nodes each holding one (or more) data field(s) DATA and 
two link fields; Chi LDS and R CHILID: rrpoints to the root node of the 
binary tree. 


Operations: 


èe Check if binary tree T is empty 
(HEC TETIT EERTEMET a T Roo e an Eline rake) 


èe Make a binary tree T empty by setting T to NIL 
MAF EAE TNE ERTEM ETY ER T) 


èe Move to the left subtree of a node X by moving down its LCHILD pointer 
MOVE GE ES UE aE eaE (20) 


èe Move to the right subtree of a node X by moving down its RCHILD pointer 
MOYETE TOCHT iS UBTREEX) 


èe Insert node containing element ITEM as the root of the binary tree T; 
Ensure that T does not point to any node before execution 
WERTER OOTI e TTEMM) 


® Insert node containing ITEM as the left child of node x; Ensure’ that 
X does not have a left child node before execution 
TO EFIA CERT C) 


® Insert node containing TFM as the right child of node X; Ensure thart 
X does not havea right child node before execution 
ENS ERE GH NA eine) 
èe Delete root node of binary tree T; Ensure that the root does not have 
child nodes 
Wiebe ile INCOME = 9 8) 


èe Delete node pointed to by xX from the binary tree and set X to point 
to the left child of the node; Ensure that the node pointed to by xX 
does not have a right child 

Wiha EOUEINIE | Iie iev ah Colstib db BY (2) 

@® Delete node pointed to by xX from the binary tree and set X to point 
to the right child of the node; Ensure that the node pointed to by xX 
does not have a left child 

DiI TEOTAN RTOCTE HATED) 
e Store ITEM into a node whose address is xX 
RORE OPATA ETTEM) 
@® Retrieve data of a node whose address is X and return it in ITEM 
USL Vics IDI (2G) JE Neale) 
© Perform Inorder traversal of binary tree T 
INORDER TRAVERSAL (T) 
© Perform Preorder traversal of binary tree T 
ERE ORDER li a/R ioe i) 


EO WOUNDS, RADE RS ATIN) 





© Perform Postorder traversal of binary tree T y 


The McGraw-Hill Companies 


Trees and Binary Trees 175 


O Summary 


> Trees and binary trees are non-linear data structures, which are inherently two dimensional 
in structure. 

> While trees are non empty and may have nodes of any degree, a binary tree may be empty 
or hold nodes of degree, at most two. 

> The terminologies of root node, height, level, parent, children, sibling, ancestors, leaf or 
terminal nodes and non-terminal nodes are applicable to both trees and binary trees. 

> While trees are efficiently represented using linked representations, binary trees are 
represented using both array and linked representations. 

> The traversals of a binary tree are inorder, postorder and preorder. 

> A prudent use of null pointers in the linked representation of a binary tree yields a 
threaded binary tree. 

> The application of binary trees has been demonstrated on expression trees and its related 
concepts. 

> The ADT of the binary tree is presented. 





© Illustrative Problems 


Problem 8.1 For the binary tree shown in Fig. I 8.1, 

(a) Identify e © 
(i) Root (ii) children of G (iii) parent of D (iv) siblings of Z 
(v) Level of C (vi) Ancestors of Y (vii) leaf nodes (viii) height (E) € ey 
of the binary tree 

(b) Obtain the inorder, postorder and preorder traversals of the (z) 
binary tree. Fig. | 8.1 

Solution: 


(a) (i) Root : B Gi) Children of G: Y,D (iii) Parent of D: G (iv) Siblings of Z: None (v) Level of 
C: 2 (vi) Ancestors of Y : G, B (vii) Leaf nodes : E, Z, D (viii) Height of the binary tree :4. 
(b) Inorder traversal : ECBYZGD 
The output of the traversal which follows the scheme of Algorithm 8.1 can be dissected as 


E € B F <2 G D 
Ld Po 
traverse left subtree of root process root traverse right subtree of root 


Postorder traversal : ECZYDGB. 
The output of the traversal following the scheme of Algorithm 8.2 can be dissected as. 


E C Z Y D & B 
PS e U 


traverse left subtree of root traverse right subtree of root process root 





The McGraw-Hill Companies 


176 Data Structures and Algorithms 


Preorder traversal : BCEGYZD 
The output of the traversal following the scheme of Algorithm 8.3 can be dissected as 


B C E G Y Z D 
Process root traverse left subtree of root traverse right subtree of root 


Problem 8.2 Obtain an array representation and a linked representation of the binary tree 
shown in Fig. I 8.1. 


Solution: ‘To obtain the array representation we first number the nodes of the binary tree akin 
to that of a complete binary tree, as shown below: 





[1] [2] GB) 4) [S] [6] [7] [8] [9] [10] [11] [12] [13] 


The linked representation is given as: 


EEN 
EEN 





Problem 8.3 A binary tree Thas 9 nodes. The inorder and preorder traversals of Tyield the 
following: 
Inorder traversal (I) : E A C K 3 H D B G 
Preorder traversal (P) : 3 A E K 6 D H G B 
Draw the binary tree T. 


Solution: The key to the solution of this problem is the observation that the first occurring node 
in a preorder traversal is the root of the binary tree and that, once the root is known, the nodes 
forming the left and the right subtrees can be extracted from the Inorder traversal list. Application 





The McGraw-Hill Companies 


Trees and Binary Trees 177 


of this key to each of the left and right subtree by obtaining their respective roots from the 
preorder traversal and moving on to inorder traversal to obtain the nodes forming the sub- 
subtrees can eventually lead to the tracing of the binary tree. 

From P: Root of the binary tree is F. 

From I: the nodes forming the left and right subtrees of F are 


E A È TAY D B G 


Loo E Loo ë E 
Left subtree Root Right subtree 


The binary tree can be roughly traced as shown below: 


> 
ince, ne 


In the next step we proceed to obtain the structure of the left and right subtrees. 
From P : Root of the left subtree is A and root of the right subtree is D 


F A E K Cc D H G B 
Po [Po 
Left subtree Right subtree 


From I: the nodes forming the left and right sub-subtrees are 


eA ©) AN An aj 


Left Right Left Right 
subtree of subtree of subtree of subtree of 
node A node A node D node D 


Tracing the binary tree yields, 


A 
ao e 
O Aa” O ZX 





The McGraw-Hill Companies 


178 Data Structures and Algorithms 


Proceeding in a similar fashion, we obtain the roots of the subtrees {C,K} and {B,G} to be K and 
G respectively. The final trace yields the binary tree: 


Example 8.4 Make use of the infix and postfix expressions given below to trace the 
corresponding expression tree 
Inns + ABP C/E TH 
Postfix: ABC*FHT/+ 
Solution: ‘The key to the problem is similar to the one discussed in Illustrative Problem 8.3 but 
for the difference that the root node to be picked from the postfix expression is the last occurring 


node. 
Thus the expression tree traced in the first step is: 


In the next step the expression tree traced would be 


o 
we nR 


The McGraw-Hill Companies 


Trees and Binary Trees 179 


Progressing in this way, the final expression tree is obtained as shown in the adjacent figure. 


Note: Though the expression tree could be easily traced from infix expression alone, the objective 
of the problem is to emphasize the fact that a binary tree can be traced from its inorder and post 
order traversals as well. 


Example 8.5 What does the following pseudocode procedure do to the binary tree given in 
Fig. I 8.5, when invoked as WHAT Do I DO (THIS)? 


procedure WHAT DO I DO (HERE) 
if HERE # NIL then 
Meallan  DOm Der Lehn Dann (riE RE D 
if ( LCHILD (HERE) = NIL)and (RCHILD (HERE) = NIL) 
then print DATA (HERE) ; 
call WHAT DO I DO (RCHILD(HERE)); 


} 
end WHAT DO I DO. > 





Fig. I 8.5 


Solution: We trace the recursive procedure using a stack and for convenience of representing 
the nodes in the stack, have numbered the nodes from 1 to 7. For every call of WHAT DO I DO() 
we keep track of HERE, LCHILD (HERE) and RCHILD (HERE). 





The McGraw-Hill Companies 


180 Data Structures and Algorithms 


The first call of WHAT DO I DO (THIS) results in the following snap shot of the stack : 


Node | Node 2 Node 3 


HERE LCHILD RCHILD 
(HERE) (HERE) 


In the subsequent calls the snapshot of the stack is given by 


Call WHAT DO_I DO (Nil) 





—> 
—> 
Call WHAT DO I DO 
(Node 4) 
HERE LCHILD RCHILD 
(HERE) (HERE) 


when HERE = NIL that call of WHAT DO I DO (HERE) (marked —) terminates and the control 
returns to the previous call viz., WHAT _DO_I_DOo (Node 4) (marked >). Here LCHILD (HERE) = 
RCHILD (HERE) = NIL. Hence DATA(Node 4) viz., D is printed. Now the control moves further 
to invoke the call WHAT DO I DO (RCHILD(Node 4)) that is WHAT Do I Do (NIL) which again 
terminates. Now the control returns to the call WHAT Do I DO (Node 2) and so on. It is easy to 
see that WHAT DO I _ DO (THIS) prints the data fields of all leaf nodes of the binary tree. Hence the 
output is D, G and F. 


Example 8.6 Show that the maximum number of nodes in a binary tree of height A is 2” —1, 
Tr 
Solution: Itis known that the maximum number of nodes in level i of a binary tree is 2^1. Given 


the height of the binary tree to be h which is the maximum level, the maximum number of nodes 
is given by 


~ 


223142422 4+.-..42h1= 91 
i=1 


Example 8.7 Show that for a non-empty binary tree T if n, is the number of leaf nodes, n, 
the number of nodes of degree 2 then n, = n, + 1. 


Solution: Let n be the number of nodes in a non empty binary tree and let 1, be the number of 
nodes of degree 1. 
Now, 


n= Nn, + Ny + My (i) 
Again if b is the number of links or branches in the binary tree, all nodes except the root node 


hang from a branch yielding the relation 
b=n-1 ...(ii) 


The McGraw-Hill Companies 


Trees and Binary Trees 181 





Also, each branch emanates from a node whose degree is either 1 or 2. Hence, 

b=n, + 2n, _ (ii) 
Subtracting (iii) from (ii) yields. 

n=n; F215 +1 (iV) 
From (iv) and (i) we obtain 


ye A 


Example 8.8 A binary tree is stored in the memory of a computer as shown below. Trace the 
structure of the binary tree. 


LCHILD DATA RCHILD 





Solution: Given the root node’s address to be 7 we begin tracing the binary tree from the root 
onwards. The binary tree is given by: 








The McGraw-Hill Companies 


182 Data Structures and Algorithms 


Example 8.9 Outline a linked representation for the tree and threaded binary tree 
representation for the binary tree shown in Fig. I 8.9(a) and (b), respectively. 


(a) Tree (b) Binary Tree 
Fig. 1 8.9 


Solution: Following the node structure of TAG, DATA/DOWNLINK, LINK illustrated in 
Sec. 8.3 the linked representation of the tree is given by 





The threaded binary representation of Fig. I 8.9(b) is illustrated below and is obtained by following 
the node structure detailed in Sec. 8.7. 


T 
SE T E 


F : False 
T : True 





The inorder traversal sequence to be tracked by the threads is : y z x v u w. The threads are linked 
to the appropriate inorder successors and predecessors. 





The McGraw-Hill Companies 


Trees and Binary Trees 183 


Example 8.10 For the binary tree T given in Fig. I 8.10 obtain (i) a one-way inorder threading 
of T and (ii) one-way preorder threading of T. 


Fig. | 8.10 


Solution: 
(i) The inorder traversal of the binary tree T yields: V J A R K U 


(ii) 


A one-way threading of T is obtained by replacing the RCHILD links of the nodes, which 
are null, by threads pointing to the inorder successor of the node. Thus, the RCHILD link 
of J points to A, that of R points to K and that of U is either kept dangling (or if there is a 
head node points to the same). The threaded binary tree is shown below: 





— -J 


The preorder traversal of binary T yields: A V J K R U. 

For the one-way preorder threading, the RCHILD links of the nodes, which are null, are set 
to point to the preorder successors of the node. Thus, the RCHILD link of J points to K, that 
of R points to U and the same of U is a dangling thread or may be connected to the head 
node if available. The threaded tree for the same is shown below: 








The McGraw-Hill Companies 


184 Data Structures and Algorithms 


(@) Review Questions 


1. Which among the following is not a property of a tree? 
(i) There is a specially designated node called the root 
(ii) The rest of the nodes could be partitioned into t disjoint sets (t = 0) each set 
representing a tree T; i = 1, 2,...t known as subtree of the tree. 
(iii) Any node should be reachable from anywhere in the tree 
(iv) At most one cycle could be present in the tree 
(a) (i) (b) (ii) (c) (iii) (d) (iv) 
2. The maximum number of nodes in a binary tree of depth k is 
(a) k=l (b) Mel) = 4 (c) Del (d) 9(k+1)-1 
3. For a binary tree of 2.k nodes, k > 1, the number of pointers and the number of null pointers 
that the tree would use for its representation is respectively given by 


(a) k and k+1 (b) 2.k and 2.k + 1 (c) 4k and 4.k+1 (d) 4.4 and 2.k+1 
4. An inorder and postorder traversal of a binary tree was ‘claimed’ to yield the following 
sequence: 
Inorder traversal : HAT GLOVE SOCKS SCARF GLASSES 
Post order traversal : GLOVE = SCARF HAT GLASSES SOCKS 


What are your observations? 
(i) HAT is the root of the binary tree 
(ii) SOCKS is the root of the binary tree 
(iii) the binary tree is a skewed binary tree 
(iv) the traversals are incorrect 
(a) (i) (b) (ii) (c) (iii) (d) (iv) 
5. Which of the following observations with regard to binary tree traversals is incorrect? 
(i) Given a preorder traversal of a binary tree, the root node is the first occurring item in 
the list. 
(ii) Given a postorder traversal of a binary tree, the root node is the last occurring item in 
the list. 
(iii) Inorder traversal does not directly reveal the root node of the binary tree. 
(iv) To trace back the structure of the binary tree, inorder, postorder and preorder traversal 
sequences are needed. 
6. Sketch (i) an array representation and (ii) a linked list representation for the following 


binary tree: 





The McGraw-Hill Companies 


Trees and Binary Trees 185 


7. Sketch a linked representation for a threaded binary tree equivalent of the binary tree shown 
in Review Questions 6 (Chapter 8). 
8. Obtain inorder and post order traversals for the binary tree shown in Review Questions 6 
(Chapter 8). 
9. Draw an expression tree for the following logical expression: 
p and (q or not k) and (s or b or h) 
10. Undertake post order traversal of the expression tree obtained in Review Questions 9 
(Chapter 8) and compare it with the hand computed postfix form of the logical expression. 
11. Given the following inorder and preorder traversals, trace the binary tree. 
Inorder traversal : BFGHPRSTWYZ 
Preorder traversal: PF BHGSRYTWZ 
12. Making use of Algorithm 8.4, convert the following infix expression to its equivalent postfix 
form and evaluate the postfix expression for the specified values: 
(x+y+z)Î (a+b)-g*n*m+r 
C2] 1922224) 05,07 225 fae aa a7 


(=) Programming Assignments 


1. Write a program to input a binary tree implemented as a linked representation. Execute 
Algorithms 8.1-8.3 to perform inorder, postorder and preorder traversals of the binary tree. 

2. Implement Algorithm 8.4 to convert an infix expression into its postfix form. 

3. Write a recursive procedure to count the number of nodes in a binary tree. 

4. Implement a threaded binary tree. Write procedures to insert a node NEW to the left of node 
NODE when 

(i) the left subtree of NODE is empty, and 
(ii) the left subtree of NODE is non-empty. 

5. Write non-recursive procedures to perform the inorder, postorder and preorder traversals of 
a binary tree. 

6. Level order traversal: It is a kind of binary tree traversal where elements in the binary tree 
are traversed by levels, top to bottom and within levels, left to right. Write a procedure to 
execute the level order traversal of a binary tree.(Hint: Use a Queue data structure) 
Example: Level order traversal of the following binary tree is: 8 47539 


7. Implement the ADT of a binary tree in a language of your choice. Include operations to 
(i) obtain the height of a binary tree and (ii) the list of leaf nodes. 


The McGraw-Hill Companies 


CHAPTER 


DK 
XO) GRAPHS 


\ 





9.1 Introduction 
9.2 Definitions and 


In Chapter 8 we introduced trees and graphs as examples of non Basic . 
linear data structures. To recall, non-linear data structures unlike Terminologies 
linear data structures which are uni dimensional in structure (for 9.3 Representations of 
example arrays), are inherently two dimensional in structure. Graphs 

Though in the field of computer science, trees have been 9.4 Graph Traversals 


recognized as efficient non linear data structures with their own set 
of terminologies and concepts to suit the needs of the digital 
computer, graph theory which has emerged as an independent 
field, encompasses studies on trees as well. In other words, in the field of graph theory, a tree is 
a special kind of graph holding a definition which in principle agrees with that of a tree data 
structure, but is devoid of most of the terminologies and concepts tagged to it from the view point 
of data structures. This distinction needs to be borne in mind when one defines a tree- rather 
‘redefines’ tree as a special kind of graph in this chapter. 

Though graph theory has turned out to be a vast area with innumerable applications, we 
restrict the scope of this chapter to introducing graphs as effective data structures only. Hence 
only those concepts and terminologies needed to promote this aspect of graphs are dealt with. 


9.5 Applications 


Introduction 9.1 


The history of graphs dates back to 1736 in what is now referred to as the classical Koenigsberg 
bridge problem. In the town of Koenigsberg in Eastern Prussia, the island of Kneiphof existed in 
the middle of the river Pregal. The river bifurcated itself bordering the land areas as shown in Fig. 9.1. 
There were seven bridges connecting the land areas as shown in the figure. The problem was to 
find if the people of the town could walk on the seven bridges once only, starting from any land 
area, and returning to the starting land area after traversing all the bridges. 

An example walk is listed below: 

Start from the land area P-traverse bridgel; land area R-traverse bridge 3; land area 
Q-traverse bridge 4; land area R-traverse bridge 5; land area S—traverse bridge 7; land area 
Q-traverse bridge 7; land area S—-traverse bridge 6; land area P-traverse bridge 2; land area R. 

This walk, neither does it traverse all bridges once only nor does it reach its starting point 
which is land area P. 


The McGraw-Hill Companies 


Graphs 


River Pregal 


YF : Land Area 


ZOTIT : Bridge 


187 





Fig. 9.1 The Koenigsberg bridge problem 


It was left to Euler to solve the puzzle of Koenigsberg bridge problem by stating that there is 
no way that people could walk across the bridges once only and return to the starting point. The 
solution to the problem was arrived at by representing the land areas as circles called vertices and 


bridges as arcs called links or edges connecting the circles. 
Defining the degree of a vertex to be the number of arcs 
converging on it, or in other words, the number of bridges 
which descend on a land area, Euler showed that a walk is 
possible only when all the vertices have even degree. That is, every 
land area needs to have only even number of bridges 
descending on it. In the case of the Koenigsberg bridge 
problem, all the vertices turned out to have an odd degree. 
Figure 9.2 illustrates the graph representation of the 
Koenigsberg bridge problem. This vertex-edge representation 
is what came to be known as a graph (here it is a multigraph). 
The walk which beginning from a vertex and returning to it 
after traversing all edges in the graph came to be known as an 
Eulerian walk. 


Fig. 9.2 Graph representation 
of the Koenigsberg 
bridge problem 


Since this first application, graph theory has grown in leaps and bounds to encompass a wide 
range of applications in the fields of cybernetics, electrical sciences, genetics and linguistics, to 


quote a few. 


Definitions and Basic Terminologies 





Graph 


A graph G = (V, E) consists of a finite non empty set of vertices V also 
called points or nodes and a finite set E of unordered pairs of distinct 


vertices called edges or arcs or links. 


Example Figure 9.3 illustrates a graph. Here V = {a, b, c, d} and 
E = {(a, b), (a, c), (b, c), (c, d)}. However it is convenient to represent 


edges using labels as shown in the figure. 





Fig. 9.3 A graph 





The McGraw-Hill Companies 


188 Data Structures and Algorithms 


V : Vertices : {a, b, c, d} 

E ? Edges = {ep 5, Ey Ci} 
A graph G = (V, E) where E = 9, is called as a null or empty graph. A graph with one vertex and 
no edges is called a trivial graph. 


Multigraph 


A multigraph G = (V, E) also consists of a set of vertices and edges except that E may contain 
multiple edges (i.e.) edges connecting the same pair of vertices, or may contain loops or self edges 
(i.e.) an edge whose end points are the same vertex. 


Example Figure 9.4 illustrates a multigraph 
Observe the multiple edges e,, e, connecting vertices a, b and 
es, €g €7 connecting vertices c, d respectively. Also note the 
self edge ey. 

However, it has to be made clear that graphs do not 
contain multiple edges or loops and hence are different from 
multigraphs. The definitions and terminologies to be 
discussed in this section are applicable only to graphs. Fig. 9.4 A multigraph 





Directed and undirected graphs 


A graph whose definition (stated in Sec. 9.2) makes reference to unordered pairs of vertices as edges 
is known as an undirected graph. The edge e; of such an undirected graph is represented as (v, v;) 
where v, v; are distinct vertices. Thus an undirected edge (v, v;) is equivalent to (v, v;). 

On the other hand, directed graphs or digraphs make reference to edges which are directed (i.e.) 
edges which are ordered pairs of vertices. The edge e; is referred to as <v,, v> which is distinct from 
< v, v> where v, v, are distinct vertices. In <v, UP, 0; is known as tail of the edge and V; as the 


head. 


Example Figure 9.5(a-b) illustrates a digraph and an undirected graph. 





(a) Digraph (b) Undirected graph 


Fig. 9.5 A digraph and an undirected graph 


In Fig. 9.5(a), e} is a directed edge between v} and v,, (i.e.) e} = <v}, v>, whereas in Fig. 9.5(b) 
e} is an undirected edge between v} and Vy, (i.e.) e} = (Vy, V). 





The McGraw-Hill Companies 


Graphs 189 


The list of vertices and edges of graphs G, and G, are: 

Vertices (G4): {V4 Vs Ua, U4} 

Vertices (Gy): {V4 V» Va, U4} 

Edges (G,): {<V V> <V, V> <UVy, V3 <Va, VA <Uy V>} Or fej, Cy, Cx Cy Es} 

Edges (G,): {01 Və) (V4, V3) (Vo, V3) (V3, Vay OF {e}, ez Cx Ca} 

In the case of an undirected edge (v; v,) in a graph, the vertices v, v; are said to be adjacent or 
the edge (v, v;) is said to be incident on vertices v, Vj. Thus in Fig. 9.5(b) vertices v,, v} are adjacent 
to vertex v, and edges e4: (V1, U5), €3: (Vz, V3) are incident on vertex v». 

On the other hand, if <v, v> is a directed edge, then v; is said to be adjacent to v, and v, is said 
to be adjacent from v;. The edge <v; v> is incident to both v, and v, Thus in Fig. 9.5(a) vertices v, 
and v, are adjacent from v}, and v, is adjacent to vertices v, and v3. The edges incident to vertex 


U, are <U1, V>, <Uz, V4> and <Vy, V>. 


Complete graphs 


n-(n-1) 
The number of distinct unordered pairs (v; v;), v; + v; ina graph with n vertices is "C, = —7 


n . 
An n vertex undirected graph with exactly A edges is said to be complete. 


Example Figure 9.6 illustrates a complete graph. The undirected graph with 4 vertices has 
all its C, = 6 edges intact. 


Fig. 9.6 A complete graph 


In the case of a digraph with n vertices, the maximum number of edges is given by "P, =n - 
(n — 1). Such a graph with exactly n.(n-1) edges is said to be a complete digraph. 


Example Figure 9.7(a) illustrates a digraph which is complete and Fig. 9.7(b) a graph 
which is not complete. 


(a) Complete (b) Not complete 
Fig. 9.7 Digraphs which are complete and not complete 





The McGraw-Hill Companies 


190 Data Structures and Algorithms 


Subgraph 
A subgraph Œ = (V’, E’) of a graph G = (V, E) is such that V’ c V and F c E. 


Example Figure 9.8 illustrates some subgraphs of the directed and undirected graphs 
shown in Fig. 9.5 (Graphs G, and G,) 





(a) Subgrahs of G, (b) Subgraphs of G» 
Fig. 9.8 Subgraphs of graphs G, and G, (Fig. 9.5) 


Path 


A path from a vertex v; to vertex v; in an undirected graph G is a sequence of vertices v, V1, Viy 


. Ug 0; such that (v, v1) (%,, V) --- (Viy v;) are edges in G. If G is directed then the path from 


v; to v; more specially known as a directed path consists of edges < v, 01> <01, 11> ... < Viy 0 
in G. 


Example Figure 9.9(a) illustrates a path P} from vertex v, to v, in graph G, of Fig. 9.5(a) 
and Fig. 9.9(b) illustrates a path P, from vertex v4 to v, of graph G, of Fig. 9.5(b). 


(a) A path from vı to v4 in (b) A path from v; to v4 in 
directed graph G] undirected graph Gə 


P1 = {V1; V2, Vis V3, V4} P2 = {V1 V25 V35 V4} 


Fig. 9.9 Path between vertices of a graph (Fig. 9.5) 


The length of a path is the number of edges on it. 


Example In Fig. 9.9 the length of path P} is 4 and the length of path P, is 3. 
A simple path is a path in which all the vertices except possibly the first and last vertices are 
distinct. 





The McGraw ‘Hill Companies 


Graphs 191 


Example In graph G, (Fig. 9.5(b)), the path from v, to v4 given by {(1, 05), (Ux, U3), (V3, 04)} 
and written as {v}, V», Vs, V4} is a simple path where as the path from v, to v, given by {(V3, 04), 
(V1, V), (V> V3), (Vz, V4)} and written as {v}, V1, Vz, Vz, V4} is not a simple path but a path due to 
the repetition of vertices. 

Also in graph G} (Fig. 9.5(a)) the path from v, to v, given by {<v}, V>, <9, V>, <01, 03>} written 
as {V1, V> V1, V3} is not a simple path but a mere path due to the repetition of vertices. However, 
the path from v, to v, given by {<v,, v>, <V}, V3>, <Va, U4>} written as {V,, 0, Va, V4} is a simple 
path. 

A cycle is a simple path in which the first and last vertices are the same. A cycle is also known 
as a circuit, elementary cycle, circular path or polygon. 


Example In graph G, (Fig. 9.5(b)) the path {v}, v5, v3, vı} is a cycle. Also, in graph G} 
(Fig. 9.5(a)) the path {v,, v5, v4} is a cycle or more specifically a directed cycle. 


Connected graphs 


Two vertices v, v; in a graph G are said to be connected only if there is a path in G between v; and 
v;. In an undirected graph if v; and v; are connected then it automatically holds that v, and v; are 
also connected. 

An undirected graph is said to be a connected graph if every pair of distinct vertices v, v; are 
connected. 


Example Graph G, (Fig. 9.5(b)) is connected where as graph G, shown in Fig. 9.10 is not 
connected. 


In the case of an undirected graph which is not connected, the maximal connected subgraph is 
called as a connected component or simply a component. 


Example Graph G, (Fig. 9.10) has two connected components viz., graph G3, and Gaz. 


© Graph G3 (v4) 


Subgraph G3, Subgraph G3 


Fig. 9.10 An undirected graph with two connected components 


A directed graph is said to be strongly connected if every pair of distinct vertices v; v; are 
connected (by means of a directed path). Thus if there exists a directed path from 9; to v; then 
there also exists a directed path from v; to v; 


Example Graph G, shown in Fig. 9.11 is strongly connected. 


The McGraw-Hill Companies 





192 Data Structures and Algorithms 


Graph G4 





Fig. 9.11 A strongly connected graph 


However, the digraph shown in Fig. 9.12 is not strongly connected but is said to possess two 
strongly connected components. A strongly connected component is a maximal subgraph that is 
strongly connected. 


Graph Gs 


Subgraph Gs, C's) 
Subgraph G50 


Fig. 9.12 Strongly connected components of a digraph 


Trees 


A tree is defined to be a connected acyclic graph. The following properties are satisfied by a tree: 
(i) There exists a path between any two vertices of the tree, and 
(ii) No cycles must be present in the tree. In other words, trees are acyclic. 


Example Figure 9.13(a) illustrates a tree. Figure 9.13(b) illustrates graphs which are not 
trees due to the violation of the property of acyclicity and connectedness respectively. 


Graph G7 Graph Gg 


Graph G¢ 


(a) Tree (b) Graphs that are not trees 
Fig. 9.13 Graphs which are trees and not trees 


Note the marked absence of any hierarchical structure and its allied terminologies of parent, 
child, sibling, ancestor, level etc insisted upon in the tree data structure. However, both the 
definitions of trees—as a data structure and a type of graph—agree on the principles of 
connectedness and acyclicity. 





The McGraw-Hill Companies 


Graphs 193 


Degree 


The degree of a vertex in an undirected graph is the number of edges incident to that vertex. A 
vertex with degree one is called as a pendant vertex or end vertex. A vertex with degree zero and 
hence has no incident edges is called an isolated vertex. 


Example In graph G, (Fig. 9.5(b)) the degree of vertex v, is 3 and that of vertex v, is 2. 
In the case of digraphs, we define the indegree of a vertex v to be the number of edges with v as 
the head and the outdegree of a vertex to be number of edges with v as the tail. 


Example In graph G, (Fig. 9.5(a)) the indegree of vertex v, is 2 and the out degree of vertex 
v, is 1. 


Isomorphic graphs 


Two graphs are said to be isomorphic if, 
(i) they have the same number of vertices 
(ii) they have the same number of edges 
(iii) they have an equal number of vertices with a given degree 


Example Figure 9.14 illustrates two graphs which are isomorphic. 





Fig. 9.14 /somorphic graphs 


The property of isomorphism can be verified on the lists of vertices and edges of the two 
graphs Gg and Gy when superimposed as shown below: 


Vertices (Gg) 


Vertices (Go) 


Degree of the vertices : 
Edges (Gg) 
Edges (Gg) 





Cut set 


A cut set in a connected graph G is the set of edges whose removal from G leaves G disconnected, 
provided the removal of no proper subset of these edges disconnects the graph G. Cut sets are 
also known as proper cut set or cocycle or minimal cut set. 





The McGraw-Hill Companies 


194 Data Structures and Algorithms 


Example Figure 9.15 illustrates the cut set of the graph Gj). The cut set {e}, e4} disconnects 
the graph into two components as shown in the figure. {e;} is also another cut set of the graph. 







ee ae e 
Cut set: {/,, 14} 5 


Graph Gio 


e3 





(a) A graph (a) A cut set of the graph 
Fig. 9.15 A cut set of a graph 


Labeled graphs 


A graph G is called a labeled graph if its edges and / or vertices are assigned some data. In 
particular if the edge e is assigned a non negative number l(e) then it is called the weight or length 
of the edge e. 


Example Figure 9.16 illustrates a labeled graph. A graph 
with weighted edges is also known as a network. 





Eulerian graph Fig. 9.16 A labeled graph 


A walk starting at any vertex going through each edge exactly once and terminating at the start 
vertex is called an Eulerian walk or Euler line. 

The Koenigsberg bridge problem was in fact a problem of obtaining an Eulerian walk for the 
graph concerned. The solution to the problem discussed in Sec. 9.1 can be rephrased as, an 
Eulerian walk is possible only if the degree of each vertex in the graph is even. 

Given a connected graph G, G is an Euler graph iff all the vertices are of even degree. 


Example Figure 9.17 illustrates an Euler graph. {e}, e>, 
ez, €4} shows a Eulerian walk. The even degree of the vertices 
may be noted. 


Hamiltonian circuit 


A Hamiltonian circuit in a connected graph is defined as a 
closed walk that traverses every vertex of G exactly once, 
except of course the starting vertex at which the walk Fig. 9.17 An Euler graph 
terminates. 

A circuit in a connected graph G is said to be Hamiltonian if it includes every vertex of G. If any 
edge is removed from a Hamiltonian circuit then what remains is referred to as a Hamiltonian 
path. Hamiltonian path traverses every vertex of G. 





The McGraw Hill Companies 


Graphs 195 


Example Figure 9.18 illustrates a Hamiltonian circuit. 


Cs) Hamiltonian circuit : 
{Vis V3, V4, V2, V6, V5, Vist 


Fig. 9.18 A Hamiltonian circuit 


Representations of Graphs J3 





The representation of graphs in a computer can be categorized as (i) sequential representation and 
(ii) linked representation. Of the two, though sequential representation has several methods, all of 
them follow a matrix representation thereby calling for their implementation using arrays. 

The linked representation of a graph makes use of a singly linked list as its fundamental data 
structure. 


Sequential representation of graphs 


The sequential or the matrix representation of graphs have the following methods: 
(i) Adjacency matrix representation 

(ii) Incidence matrix representation 

(iii) Circuit matrix representation 

(iv) Cut set matrix representation 
(v) Path matrix representation 


Adjacency matrix representation 


The adjacency matrix of a graph G with n vertices is an n x n symmetric binary matrix given by 
A= [a;l defined as 
aj=1 ifthe ith and j vertices are adjacent (i.e.) there is an edge 
connecting the it and j vertices 
=0 otherwise, (i.e.) if there is no edge linking the vertices. 


Example Figure 9.19(a) illustrates an undirected graph whose adjacency matrix is shown 
in Fig. 9.19(b). 
It can easily be seen that while adjacency matrices of undirected graphs are symmetric, nothing 
can be said about the symmetricity of the adjacency matrix of digraphs. 


The McGraw-Hill Companies 





196 Data Structures and Algorithms 


~ 





‘is S A 
wi Oo Lt L B 
vo} 1 O 1 0 
a= sli < 8 1 
v| 0 O 1 O 
(a) Undirected graph (b) Adjacency matrix of graph Gj, 


Fig. 9.19 Adjacency matrix of an undirected graph 


Example Figure 9.20(a-b) illustrates a digraph and its adjacency matrix representation. 


Graph G12 
(2) Vi; V2 %V3 V4 
(1) vi} 0 1 0 0 
M V2 0 0 l l 
vt l 000 
(a) v4} 1 O O p 
(a) Digraph (b) Adjacency matrix representation 
of a digraph 


Fig. 9.20 Adjacency matrix representation of a digraph 


Incidence matrix representation 


Let G be a graph with n vertices and e edges. Define an n x e matrix M = [m,] whose n rows 
correspond to n vertices and e columns correspond to e edges, as 


m,=1 if the j'" edge e; is incident on the i'" vertex v, otherwise 
= 0 
Matrix M is known as the incidence matrix representation of the graph G. 


Example Consider the graph G}, shown in Fig. 9.21(a), the incidence matrix representation 
for the graph is given in Fig. 9.21(b). 


Graph G)3 
e] 
vil 1 1 0 0 0 =O 
ez Be i v| 1 0 1 10 0 
“vo 1 1 0 21 0 
SOO perir 
es e6 V5} 0 0 0 0 0 1 
(a) Graph G3 (b) Incidence matrix of G43 


Fig. 9.21 Incidence matrix representation of a graph 


The McGraw-Hill Companies 





Graphs 197 


Circuit matrix representation 


For a graph G let the number of different circuits be t and the number of edges be e. Then the 
circuit matrix C= [Cis] of G is a t x e matrix defined as 


C;=1 ifthe ith circuit includes the j edge, otherwise 
= 0 


Example Consider the graph G,, shown in Fig. 9.22(a). The circuits for this graph expressed 
in terms of their edges are 1: {e}, e>, €3} 2: {@s, Cy es} 3: {e}, e> es, C4}. The circuit matrix C of order 
3 x 6 is shown in Fig. 9.22(b). 


C4 eg e] ez e3 e4 es 6 
(v1) (4) (vs) ii 1 100 0 
m C=2| 0 0 1 1 1 =O 
es 214 1 @ J 1 O 

Circuit 1 : {e), e2, e3} 

© 3) Circuit 2 : {e3, e4, €s} 
e> Circuit 3 : {e], eo, es, €4} 
(a) Graph G44 (b) Circuit matrix of G14 


Fig. 9.22 Circuit matrix representation of a graph 


Cut set matrix representation 


For a graph G, a matrix S = [s;] whose rows correspond to cut sets and columns 
correspond to edges of the graph is defined to be a cut set matrix if 
s;,=1 ifthe i cut set contains the j™ edge, otherwise 
=0 


Example Consider the graph G45 shown in Fig. 9.23(a). The cut sets of the graph are 1:{e,} 
2:{€1, Cn} 3:{@, e3} and 4:{e,, e3}. The cut set matrix representation is shown in Fig. 9.23(b). 


C4 


acl (12) ej @& €3 &4 
1} 0 0 l l : {e4} 
e] diy pa Zidi 1 8 Ø 2: {1,2} 
3} 0 1 1 O 3: {e5, e3} 
© © 41 0 1 O | 4: {e;, e3} 
e3 
(a) Graph Gį5 (b) Cut set matrix of G45 


Fig. 9.23 Cut set matrix representation of a graph 
Path matrix representation 


A path matrix is generally defined for a specific pair of vertices in a graph. If (u, v) is a pair of 
vertices then the path matrix denoted as P(u,v) = [pj] is given by 


p;=1 ifthe j edge lies in the it path between vertices u and v, otherwise 





The McGraw-Hill Companies 


198 Data Structures and Algorithms 


Example Consider the graph G,, shown in Fig. 9.24(a). The paths between vertices v, and 
V4 are 1:{e,, e4} and 2:{e,, e3, e4}. The path matrix representation is shown in Fig. 9.24(b). 


e€] €22 €3 & Paths : 











0 1 0 |1 l : {e5, e4} 
P(vi, va) = | 
ZII OF A d 2: {e], €3, €4} 
(a) Graph G16 (b) Path matrix between vı v4 of Gi¢ 


Fig. 9.24 Path matrix representation 


Of all these sequential representations, adjacency matrix representation represents the graph 
best and is the most widely used representation. The adjacency matrix A of a graph G with n 
vertices has an order of n x n. As a consequence, graph algorithms which make use of the 
adjacency matrix representation are bound to report a time complexity of O(n?) since at least 
n? — n entries (excluding the diagonal elements) are to be examined. 


Linked representation of graphs 


The linked representation of graphs is referred to as adjacency list representation and is 
comparatively efficient with regard to adjacency matrix representation. 

Given a graph G with n vertices and e edges, the adjacency list opens n head nodes 
corresponding to the n vertices of graph G, each of which points to a singly linked list of nodes, 
which are adjacent to the vertex representing the head node. 


Example Figure 9.25 illustrates a graph and its adjacency list representation. 
It can easily be seen that if the graph is undirected, then the number of nodes in the singly linked 
lists put together is 2e where as in the case of digraphs the number of nodes is just e, where e is 
the number of edges in the graph. 

In contrast to adjacency matrix representations, graph algorithms which make use of an 
adjacency list representation would generally report a time complexity of O(n + e) or O(n + 2e) 
based on whether the graph is directed or undirected respectively, thereby rendering them 
efficient. 


Head nodes 





(a) Graph G17 (b) Adjacency list representation of G17 
Fig. 9.25 Adjacency list representation of a graph 





The McGraw Hill Companies 


Graphs 199 





Graph Traversals 9.4 


Just as tree data structures support traversals of Inorder, Preorder and Postorder, graphs support 
the following traversals: 

Breadth first Traversal, and 

Depth first Traversal. 
A traversal, to recall, is a systematic walk which visits the nodes comprising the data structure 
(graphs in this case) in a specific order. 


Breadth first traversal 


We discuss the breadth first traversal of an undirected graph in this section. The traversal starts 
from a vertex u which is said to be visited. Now all nodes v, adjacent to u are visited. The 
unvisited vertices w; adjacent to each of v, are visited next and so on. The traversal terminates 
when there are no more nodes to visit. The process calls for the maintenance of a queue to keep 
track of the order of nodes whose adjacent nodes are to be visited. 

Algorithm 9.1 illustrates the procedure for breadth first traversal of a graph G. 


Algorithm 9.1: Breadth first traversal 


Procedure BFT(s) 
/* s is the start vertex of the traversal in an undirected graph G */ 
/* Q is a queue which keeps track of the vertices whose adjacent nodes 
are to be visited */ 


/* Vertices which have been visited have their ‘visited’ flags set to 


l (i.e.) visited (vertex) = Í 
Initially, Vistved (vertex) - 0 for all vertices of graph G 77 
Initialize queue Q; 
visited(s) = 1; 
call HENQUBUE (O S]; s insert S Imno OD ~ 
while not EMPTY QUEUE (Q) do M proce S aC OO a oaoa 
Gall DEOUEUE O S) /* delete s from QO*/7 
print (s); IA output vertex violed 9c 
for all vertices v adjacent to s do 
ie (visited (v) = 0) then 
T eall ENQOQUEUE (Q, vg 
visited (vì) =l; + 
end 


endwhile 
end BFT. 


Breadth first traversal as its name indicates traverses the successors of the start node, generation 
after generation in a horizontal or linear fashion. This “breadth wise” traversal is clearly visible 
when the traversal is worked over a graph represented as an adjacency list 

Example 9.1 illustrates the breadth first traversal of an undirected graph represented as an 
adjacency list. 





The McGraw-Hill Companies 


200 Data Structures and Algorithms 


Example 9.1 Consider the undirected graph G shown in Fig. 9.2.6(a) and its adjacency list 


representation shown in Fig. 9.26(b). The trace of procedure BFT(1) where the start vertex is 1, is 
shown in Table 9.1. 


Lt [Pli e E 
ESE E g" 
[ot lPi PL I 
[6 [PLs h 


(b) Adjacency list of Graph G 
Fig. 9.26 A graph and its adjacency list representation to demonstrate breadth first traversal 





(a) Graph G 





Table 9.1 Trace of the Breadth first traversal procedure on graph G (Fig. 9.26) 


Current Traversal output Status of visited flag of vertices 
Vertex {1, 2, 3, 4, 5, 6, 7, 8, 9, 10} of 
graph G 


1 T273 445677 -879710 


(Start vertex) }1f 0] o]o]ofoo fo fo} o. 


1 1 OAT Oe feo. oll) 


tfo} oj ofa ]a |1 fo fo | a | 


TE2F ORE OG O4 le on alll) 


Als elle eee 


172 3. 4 5 6°78 910 


tfofo}afa fafa fo fi |a] 


fo 2 ISA TE S OTIO 


tfofa}a fafa fz fof fa 


TELE AO GO 090) 59> a0) 


tfafa}a fafa fifo fa fa 


156749 1S os 2 On oO a el) 


E e fa fof fa 


(Contd.) 





The McGraw-Hill Companies 


Graphs 201 


(Contd.) 


156749 10 Lei 6 eon) oO) 


156749103 2 oe Oe 62/8 59 FO 


1567491032 IS 20 PANO Or a Foes Ll) 


15674910328 Le eG a Oa Ope O aa (0, 


15674910328 Breadth first traversal ends 





The breadth first traversal starts from vertex 1 and visits vertices 5,6,7 which are adjacent to 
it, while enqueuing them into queue Q. In the next shot, vertex 5 is dequeued and its adjacent, 
unvisited vertices 4, 9 are visited next and so on. The process continues until the queue Q which 
keeps track of the adjacent vertices is empty. 

If an adjacency matrix representation had been used, the time complexity of the algorithm 
would have been O(n?) since to visit each vertex, the while loop incurs a time complexity of O(n). 
On the other hand, the use of adjacency list only calls for the examination of those nodes which 
are adjacent to the given node thereby curtailing the time complexity of the loop to O(e). 


Depth first traversal 


In this section we discuss the depth first traversal of an undirected graph. The traversal starts 
from a vertex u which is said to be visited. Now, all the nodes v; adjacent to vertex u are collected 
and the first occurring vertex v, is visited, deferring the visits to other vertices. The nodes 
adjacent to v, viz., W4, are collected and the first occurring adjacent vertex viz., w,, is visited 
deferring the visit to other adjacent nodes and so on. The traversal progresses until there are no 
more visits possible. 

Algorithm 9.2 illustrates a recursive procedure to perform the depth first traversal of graph G. 


Algorithm 9.2: Depth first traversal 


Procedure DFT(s) 
pS EP tche ecart vertex “/ 
visited(s) 


print (s); j Output a vioi GeO vertex 47 
for each vertex v adjacent to s do 
if visited(v) = 0 then call DFT(v); 


end 
end DFT | F 





The depth first traversal as its name indicates visits each node, that is, the first occurring among 
its adjacent nodes and successively repeats the operation, thus moving ‘deeper and deeper’ into 
the graph. In contrast, breadth first traversal moves side ways or breadth ways in the graph. 
Example 9.2 illustrates a depth first traversal of a undirected graph. 





The McGraw-Hill Companies 


202 Data Structures and Algorithms 


Example 9.2 Consider the undirected graph G and its adjacency list representation shown 
in Fig. 9.26. Figure 9.27 shows a tree of recursive calls which represents a trace of the procedure 


DFT(1) on the graph G with start vertex 1. 


VISITED (1) = 


| 
e 
| al 
| a 
| S 
DFT (5) NNO: VETERE et: ae 
| DFT (6) | | DFT (7) | 
VISITED (5) = 1 T oa eean | PPa | 


ie 
/ N 


N 


\ 


ee eee 


DFT (4) | DFT (9) | 
(AERONA | 


VISITED (4)= 1 


DFT (2) 


VISITED (2)=1] \ 


DFT (9) Traversal Output 


VISITED (9) =1 j rsaS4a2oT & 8 Id 6 


/ 
DFT (7) _ Executed calls 
Sa [| ` to DFT() 
VISITED (7) = 1 
~~~ . Non executed 


/ __._4 ` calls to DFT() 


DFT (3) — > : flowof execution 


VISITED (3)= 1 `N _ back tracking of calls 
| during recursion 


DFT (8) 


VISITED (8)=1] ` 


< t+ < < < < 
4 a 


DFT (10) 


VISITED (10) =1 


DFT (6) 


4 


VISITED (6)= 1 


Fig. 9.27 Tree of recursive calls showing the trace of procedure DFT(1) on graph G (Fig. 9.26) 


The McGraw-Hill Companies 


Graphs 203 


The tree of recursive calls illustrates the working of the DFT procedure. The first call DFT (1) 
visits start vertex 1 and releases 1 as the traversal output. Vertex 1 has vertices 5, 6, 7 as its 
adjacent nodes. DFT (1) now invokes DFT(5), visiting vertex 5 and releasing it as the next traversal 
output. However DFT(6) and DFT(7) are kept in waiting for DFT (5) to complete its execution. 
Such procedure calls waiting to be executed are shown in broken line boxes in the tree of 
recursive calls. 

Now DFT (5) invokes DFT (4) releasing vertex 4 as the traversal output while DFT (9) is kept 
in abeyance. Note that though vertex 1 is an adjacent node of vertex 5, since no DFT( ) calls to 
vertices already visited are invoked, DFT (1) is not called for. The process continues until DFT (6) 
completes its execution with no more nodes left to visit. During recursion the calls made to DFT ( 
) procedure are indicated using solid arrows in the forward direction. 

Once DFT (6) finishes execution, back tracking takes place which is indicated using broken 
arrows in the reverse direction. Once DFT (1) completes execution the traversal output is gathered 
tobe 15429738106. 

In the adjacency list representation of graph G, DFT( ) examines each node in the adjacency 
list at most once. Since there are 2e list nodes, the time complexity turns out to be O(e). On the 
other hand, the adjacency matrix implementation for procedure DFT( ) records a time 
complexity of O(n’). 

Both breadth first and depth first traversals, irrespective of the vertex they start from, visit all 
vertices of the graph that are connected to it. Hence, if the graph is connected, both traversals 
would visit all the vertices of the graph. On the other hand, if the graphs were not connected, both 
traversals would yield only their connected components. Thus breadth first traversal and depth 
first traversal can be useful in testing for the connectivity of graphs. If after executing the traversal 
algorithms, there are any vertices which are left unvisited, then it implies that the graph is 
disconnected. 


Applications 9.5 





We illustrate the application of graphs for 
(i) Determination of shortest path (Single Source Shortest path problem), and 
(ii) Extraction of minimum cost spanning trees. 


Single-source, shortest-path problem 


Given a network of cities and the distances between them, the objective of the single-source, 
shortest-path problem is to find the shortest path from a city (termed source) to all other cities 
connected to it. 

The network of cities with their distances is represented as a weighted digraph. Algorithm 9.3 
illustrates Dijkstra’s algorithm for the single source shortest path problem. 

Let V be a set of N cities (vertices) of the digraph. Here the source is city 1. The set T is 
initialized to city 1. The DISTANCE vector, DISTANCE [2:N] initially records the distances of 
cities 2 to N connected to the source by an edge (not path!). If there is no edge directly connecting 
the city to the source, then we initialize its DISTANCE value to œ. Once the algorithm completes 
its iterations, the DISTANCE vector holds the shortest distance of cities 2 to N from the source 
city 1. 





The McGraw-Hill Companies 


204 Data Structures and Algorithms 


It is convenient to represent the weighted digraph using its cost matrix COST) xy. The cost 
matrix records the distances between cities connected by an edge. 

Dijkstra’s algorithm has a complexity of O(N?) where N is the number of vertices (cities) in the 
weighted digraph. 


Algorithm 9.3: Dijkstra’s algorithm for the single source shortest path problem 
Procedure DIJKSTRA SSSP(N, COST) 


/*N is the number of vertices labeled { 1, 2, 3,..N} of the weighted 
diora nn? MOST ERN as ies = Ge NCOs a mMabinic Or Sbleom Gia wn a Eers 
no edge then COST [i, Jj] = œ/ 


/* The procedure computes the cost of the shortest path from vertex 
1 the source, to every other vertex of the weighted digraph */ 


GR get elisa = IP AAO) Soe ee a 6 ey 
for i = 2 to N do 
DESTANCE i) A=) COST tie awk: / “Ua vetal1 Ze DISTAN ETC or 
end to the cost of the edges connecting 
vertex i with the source vertex 1. If 


Ehene is Nomedge bien COST iil, i) b= scot 
for i =~. 1 to N -1 do 


Choose a “Vertex Ulin Vv] T such that DISCTANCEIUI] 
LS a mLALmuUmMs 
Adauto ay 
for each vertex w in V-T do 
DESTANCE [wl =" minimum (DESTANCE [wi, 
Dis ANCE Wile) poe Ose wal tee wy eee )s: 
end 
end 


end DIJKSTRA-SSSP | F 


Example 9.3 Consider the weighted digraph of cities and its cost matrix shown in 
Fig. 9.28. Table 9.2 shows the trace of the Dijkstra’s algorithm. 





tos: Tt as ag 

1} 0 20 œ 40 110 

2| © 0 60 æ 0 

3} 0 © 0 æ 20 

4| œ œ 30 g 70 

Si ow wo a wz D 

Source 

(a) Weighted digraph (b) Cost matrix Cs , 5 


Fig. 9.28 A weighted digraph and its cost matrix 





The McGraw-Hill Companies 
Graphs 205 


Table 9.2 Trace of Dijkstra’s algorithm on the weighted digraph (Fig. 9.28) 


Iteration DISTANCE 
[3] [4] [5] 


a. el 
Wt 


{1, 2, 4} 
A 
E Se) 





The DISTANCE vector in the last iteration records the shortest distance of the vertices {2, 3, 4, 5} 
from the source vertex 1. 

To reconstruct the shortest path from the source vertex to all other vertices, a vector 
PREDECESSOR[1:N] where PREDECESSOR[v] records the predecessor of vertex v in the shortest 
path, is maintained. PREDECESSOR[v] is initialized to source for all v + source. 
PREDECESSOR[v] is updated by inserting the statement 


if (DISTANCE[u] + COST[u, w]) < DISTANCE[w] ) 
then PREDECESSOR[w] = u 


soon after DISTANCE [w] = minimum (DISTANCE [w], DISTANCE [u] + COST [u, w]) is 
computed in procedure DIJKSTRA_SSSP. To trace the shortest path we move backwards from the 
destination vertex, hopping on the predecessors recorded by the PREDECESSOR vector until the 
source vertex is reached. 


Example 9.4 To trace the shortest paths of the vertices from vertex 1 using Dijkstra’s 
algorithm, inclusion of the statement updating PREDECESSOR vector results in the following: 











PREDECESSOR 
== = == 





= =a 





Initialization = 


To trace the shortest path from source 1 to vertex 5, we move in the reverse direction from 
vertex 5 (shown in dotted lines) hopping on the predecessors until the source vertex is reached. 
The shortest path is given by: 


PREDECESSOR(5) = 3. PREDECESSOR(3) = 4 PREDECESSOR(4) = 1 


ee Se ee aes ee —_—_— = as m —_— a as! cy a 
7i ~~ a ay: od ~ 
ge N Ps ae aan ay 


~ yf ` 4 ~ 
Vertex 5 eH Vertex 3 | Vertex 4 4| Source Vertex 1 


Thus the shortest path between vertex 1 and vertex 5 is 1 — 4 — 3 —5 and the distance is given 
by DISTANCE[5] is 90. 





The McGraw-Hill Companies 


206 Data Structures and Algorithms 


Minimum cost spanning trees 


Consider an application where n stations are to be linked using a communication network. The 
laying of communication links between any two stations involves a cost. The problem is to obtain 
a network of communication links which while preserving the connectivity between stations does 
it with minimum cost. If the problem were to be modeled as a weighted graph, the ideal solution 
to the problem would be to extract a subgraph termed minimum cost spanning tree which while 
preserving the connectedness of the graph yields minimum cost. 

Let G = (V, E) be an undirected connected graph. A subgraph T = (V, F’) of G is a spanning tree 
of G iff T is a tree. 


Example For the connected graph undirected shown in Fig. 9.29 (a), some of the spanning 
trees extracted from the graph are shown in Fig. 9.29(b). 


(a) Connected undirected graph 


Fa 


(b) Spanning trees 


Fig. 9.29 Spanning trees of a graph 


Given a connected undirected graph there are several spanning trees that may be extracted 
from it. Now given G = (V, E) to be a connected, weighted undirected graph where each edge 
involves a cost, the extraction of a spanning tree extends itself to the extraction of a minimum cost 
spanning tree. A minimum cost spanning tree is a spanning tree which has a minimum total cost. 
Algorithm 9.4 illustrates Prims algorithm for the extraction of minimum cost spanning trees. 

A spanning tree of a graph G with n vertices will have (n — 1) edges. This is due to the property 
that a tree with n vertices has always (n — 1) edges (Refer Illustrative Problem 9.9). Also, addition 
of even one single edge results in the spanning tree losing its property of acyclicity and removal 
of one single edge results in its losing the property of connectivity. 

The time complexity of Prims algorithm is O(n”). 


Example 9.5 Consider the connected, weighted, undirected graph shown in Fig. 9.30. Table 9.3 
illustrates the trace of the PRIM’s algorithm on the graph. 





The McGraw-Hill Companies 


Graphs 207 





Fig. 9.30 A connected weighted undirected graph for the extraction of minimum cost spanning 


tree using Prims algorithm 


Algorithm 9.4: Prims Algorithm to obtain the minimum cost spanning tree from a 
connected undirected graph 


procedure PRIM(G) 
/* G = (V,E) is a weighted, connected undirected graph and E’ is 
the set of edges which are to be extracted to obtain the minimum 
Cost spanning tree 7 


E’ = @; e N e a a P77, 
Select a minimum cost edge (u, v) from E; 
YV = fu) /* Include u in V^ =/ 


while V’ + V do 
Let (u, v) be the lowest cost edge such that u is in. V 
an See ek Sy (palin We Ve 
Add edge (u, v) to set E’; 
Rdg i cOo SSE; 


endwhile 
end PRIM 
Table 9.3 Trace of the Prims algorithm on the connected weighted undirected 
graph (Fig. 9.30) 


Spanning tree 


\(1, 3)} 


(A, 3), (3, 6)} 





(Contd.) 


The McGraw-Hill Companies 





208 Data Structures and Algorithms 


(Contd.) 


(A, 3), (3, 6), (6, 4)} 


(A, 3), (3, 6), (6, 4), (3, 2) 


{1, 3, 6, 4, 2, 5} [Ry ec tee 





We first initialize V’ to a vertex of the lowest cost edge of the graph G. Then with each iteration 
we look for a lowest cost edge that has one of its end points in V’, all the while ensuring that the 
edge chosen does not destroy the property of connectedness and acyclicity insisted upon by 
spanning tree. Once V’ = V the algorithm terminates obtaining the minimum cost spanning tree. 
In this example the minimum cost spanning tree has a cost of 15. 


ADT for Graphs 


Data objects: 
A graph G of vertices and edges. Vertices represent data. 


Operations: 
e Check if graph G is empty 
CHEC GRE IMEEM VE) (UBOG Neel Un c LON) 
® Insert an isolated vertex V into a graph G. Ensure that V does not 
exist in G before insertion. 
INSERT VERTEX (GG, V) 
èe Insert an edge connecting vertices U, V into a graph G. Ensure that such 
an edge does not exist in G before insertion. 
INSERT EDGE(G, U, V) 
è Delete vertex V and all the edges incident on it from the graph G. Ensure 
that such a vertex exists in the graph before deletion. 
DE EV ie XG en) 
èe Delete an edge from the graph G connecting the vertices U, V. Ensure 
that such an edge exists before deletion. 
DELETE EDGE (G, U, V) 
e Store ITEM into a vertex V Of graph G 
SPORE TDATA AN ETTEM) 
èe Retrieve data of a vertex V in the graph G and return it in ITEM 
Ee a Gen, eee ey) 
@® Perform Breadth first traversal of a graph G. 
BET (G) 
® Perform Depth first traversal of a graph G. y, 
DFT (G) 


The McGraw ‘Hill Companies 


209 





> Graphs are non-linear data structures. The history of graph theory originated from the 
classical Koenigsberg bridge problem. 

> A graph G = (V, E) consists of a finite set of vertices V and edges E. Undirected graph, 
digraph, complete graph, subgraph, tree, isomorphic graphs and labeled graphs are graphs 
which satisfy special properties. 

> Path, simple path, cycle, degree, cut set, pendant vertex, Eulerian walk, Hamiltonian circuit 
are terminologies associated with graphs. 

> For problem solving using computers, graphs are represented using two popular methods 
viz., adjacency matrix representation and adjacency list representation which belong to the 
class of sequential and linked representations respectively. 

> The other matrix representations for graphs are incidence matrix, circuit matrix, cut set 
matrix and path matrix. 

> Graphs support the traversals of breadth first and depth first. The traversal techniques can 
be employed to test for the connectedness of the graph. 

> Two applications of graphs viz., single source shortest path problem and extraction of 
minimal spanning trees have been discussed. 


© Illustrative Problems 


Problem 9.1 Draw the graphs: 
G,: V, = {a, b, c, d} E, = {<a, b> <b, c> <d, c><c, a>} 
G»: V, = {a, b, c, d} E, = {(a, b) (b, c) (a, d) (b, d)} 


Solution: Figure I 9.1 illustrates the graphs. Here G, is a digraph and G, is an undirected graph. 
G3: 





Fig. 19.1 
© 
2 


Problem 9.2 For the graph given in Fig. I 9.2 find 


(i) an isolated vertex 
(ii) degree of node b (a) O 
(iii) a simple path from a to c 
(iv) a path which is not simple from a to c 
(v) a cycle Jy 


(vi) a pendant vertex Fig. | 9.2 


The McGraw-Hill Companies 





210 Data Structures and Algorithms 


Solution: 
(i) e 
(ii) 2 
(iii) {a, d, c} or {a, b, d, c} 
(iv) {a, b, d, a, c} 
(v) {a, b, d} 
(vi) f 


Problem 9.3 For the graph given in Fig. I 9.3 
(a) obtain 
(i) The cut sets 
(ii) An Eulerian walk 
(iii) A Hamiltonian path 





(b) Is the graph complete? Fig. 19.3 
Solution: 
(a) (i) The cut set is {C} since removal of the vertex with all the edges incident on it disconnects 
the graph. 


(ii) No, an Eulerian walk does not exist since not all nodes have even degree. 
(iii) A Hamiltonian path is given by DBCAE 
(b) No, the graph is not complete since all the °C, edges are not available. 


Problem 9.4 Represent the graph shown in Fig. I 9.3 using 
(i) Adjacency matrix 

(ii) Adjacency list and 

(iii) Incidence matrix 


Solution: 
ABCDE 
AlO O0 1 0 1 
l l l B}O 0 1 1 0 
(i) Adjacency matrix cli 1010 
Did 1 10 0 
E1 0000 


Gi) Adjacency List al lel piel n 





The McGraw-Hill Companies 


Graphs 211 
Al 1 
Bl 0 
C| 1 
D| 0 
E| 0 


Problem 9.5 Show that if d; is the degree of vertex i in a graph G with n vertices and e edges 


(iii) Incidence matrix 


O OF e 0D 
or O e O 
O e.e e O O&O 
=. oO oO O e 


then 





Solution: Since each edge contributes to the degree of its two end vertices, 
hy Fa F aal m 28 
(di +d, +....d,) 


(i.e.) e= 5 





Hence, the result. 


Problem 9.6 With the help of the result $, d; =2e (proved in Illustrative Problem 9.5) 
1 
show that the number of vertices of odd degree is even. 


n 
Solution: 24a a+ Dd; 
i=1 odd degree even degree 
vertices vertices 
odd degree even degree 
vertices vertices 
=> $ d, = an even number c 


odd degree 


vertices 


As each d, of the summation is an odd number, for the summation to be an even number (denoted 
as c ), the number of terms must be even. Hence the number of vertices of odd degree is even. 





The McGraw-Hill Companies 


212 Data Structures and Algorithms 


Problem 9.7 There is one and only one path between every pair of vertices in a tree T. 
Prove. 


Solution: If there is more than one path between a pair of distinct vertices v, v; in a tree T then 
it means that a circuit exists. Hence T is no more a tree. Therefore there exists one and only path 
between every pair of vertices in a tree T. 


Problem 9.8 Ifin a graph G there is one and only one path between every pair of vertices 
then G is a tree. 


Solution: For G to be a tree, (i) G should be connected and (ii) G should have no cycles. 
(i) is true since there exists a path between every pair of vertices 
(ii) is also true since there is one and only one path between every pair of vertices which 
ensures absence of cycles (Illustrative Problem 9.7) 
Hence G is a tree. 


Problem 9.9 A tree T with n vertices has (n — 1) edges. Prove. 


Solution: We prove this by induction. 

For n =1, a tree has no edge or has (1 — 1) = 0 edges. 

For n =2, a tree has one edge or has (2 — 1) = 1 edge. 
The statement is therefore true for n = 1 and n = 2. Let us suppose the theorem holds for a tree 
T with n-1 vertices. Now to prove that it is true for a tree T with n vertices. Consider a tree T 
with n vertices. Remove an edge e from T. This disconnects T and results in two trees T}, T, each 
with n’ and n-n’ nodes respectively, for some n’. Since n’ and n—n’ are fewer than n, the total 
number of edges in T} and T, put together is (n’—1) + (n — n’-1) = n-2. Replacing e, the tree T has 
(n—2+1) = n-1 edges. Hence the proof. 


Problem 9.10 Extract a minimum cost spanning tree for the graph shown in Fig. I 9.10. 


45 





Fig. 1 9.10 


Solution: ‘Table I 9.1 illustrates the trace of the Prims algorithm in obtaining the minimum cost 
spanning tree. 





The McGraw-Hill Companies 


Graphs 213 
Table I 9.1 


Minimum Cost 
Spanning tree 


{2} = 


(2, 5} \(2, 5)} 


(2, 5, 7} ((2, 5) (5, 7)} 


On a 2, OI Gy 7) 1s): 


(2, 5, 7, 6, 4} {(2, 5) (5, 7) (7, 6) (5, 4)} 


{2, 5, 7, 6, 4, 8} (2 5) 6, 7) (7, 6) 6, 4) (4, 8)} 


(2, 5, 7, 6, 4, 8, 3} | {(2, 5) 6, 7) (7, 6) (5, 4) 
(4, 8) (5, 3) 


(2, 5, 7, 6, 4, 8, 3, 1} | 12 5) 6, 7) (7, 6) 5, 4) 
(23), 2) 3)) 





Total Cost of the minimum spanning tree is 145. 


Problem 9.11 Obtain a solution to the single-source, shortest-path problem defined on the 
digraph shown in Fig. I 9.11. 


Solution: ‘Table 19.2 illustrates the trace of the Dijkstra’s algorithm on the single-source, shortest- 
path problem. 


The McGraw-Hill Companies 





214 Data Structures and Algorithms 


Source 





Table I 9.2 


Initialize {O} 
1 1072} 
102 3 
i), 2, = 
HO, oA) 
OS 23 IL 4S 





The shortest paths and distances are: 





'@) Review Questions 


1. Which of the following does not hold good for the given graph G? 





(i) an Eulerian walk exists for the graph 
(ii) the graph is an undirected graph 





The McGraw-Hill Companies 


Graphs 215 


(iii) the graph has a cycle 

(iv) the graph has a pendant vertex 

(a) (i) (b) (ii) (c) (iii) (d) (iv) 

2. Which of the following properties is not satisfied by two graphs that are isomorphic? 

(i) They have the same number of vertices 

(ii) They have the same number of edges 

(iii) They have an equal number of vertices with a given degree 

(iv) There must exist at least one cycle 

(a) (i) (b) (ii) (c) (iii) (d) (iv) 


3. For the graph shown in Review Question 1 (Chapter 9), the following matrix represents its 


d g ks 
2}1 000 
50 O 1 1 
71 110 
910 1 0 1 


(i) Adjacency matrix representation 
(ii) Incidence matrix representation 
(iii) Circuit matrix representation 
(iv) Cut set matrix representation 
(a) (i) (b) (ii) (o) (iii) (d) (iv) 
4. In the context of graph traversals, state whether true or false: 
(i) graph traversals could be employed to check for the connectedness of a graph 
(ii) for any graph, graph traversals always visit all vertices of the graph 
(a) (i) true (ii) true (b) (i) true (ii) false 
(c) (i) false (ii) true (d) (i) false (ii) false 
5. Which among the following properties is not satisfied by a minimum cost spanning tree T 
extracted from a graph G with n vertices? 
(i) T has a cycle 
(ii) T has (n-1) edges 
(iii) T has n vertices 
(iv) T is connected 
(a) (i) (b) (ii) (c) (iii) (d) (iv) 
6. Distinguish between digraphs and undirected graphs? 
7. For a graph of your choice, trace its (i) adjacency matrix and (ii) adjacency list 
representations. 
8. Draw graphs that contain (i) an Eulerian walk, and (ii) a Hamiltonian circuit 
9. How can graph traversal procedures help in detecting graph connectivity? 
10. Discuss an application of minimum cost spanning trees. 
11. Trace (i) Breadth first traversal and (ii) Depth first traversal on the graph shown in 
Fig. R9.11, beginning from vertex y. 





The McGraw-Hill Companies 


216 Data Structures and Algorithms 


oF 


Fig. R9.11 A strongly connected graph 


12. For the graph shown in Fig. R9.12, obtain the shortest path from vertex 1 to all other 
vertices: 


Source 
6 7 


Fig. R9.12 Strongly connected components of a diagram 


13. For the graph shown in Fig. 9.13, extract a minimum cost spanning tree. 





Fig. R9.13 Graphs which are tress and not trees 


(=) Programming Assignments 


1. Execute a program to input a graph G = (V, E) as an adjacency matrix. Include functions to 
(i) test if G is complete 
(ii) obtain a path and a simple path from vertex u to vertex v. 
(iii) obtain the degree of a node u, if G is undirected, and indegree and outdegree of node 
u if G is directed. 
2. Execute a program to input a graph G = (V, E) as an adjacency list. Include two functions 
BFT and DFT to undertake a breadth first and depth first traversal of the graph. Making use 
of the traversal procedures, test whether the graph is connected. 





The McGraw-Hill Companies 


Graphs 217 


3. Implement Dijkstra’s algorithm to obtain the shortest paths from the source vertex 1 to every 
other vertex of the graph G given below: 


20 


a, tics. ist 25 10 
Source ah 


m 
\ Ye RD) 
ga = yee 

Fig. P 9.3 


4. Design and implement an algorithm to obtain a spanning tree of a connected, undirected 
eraph using breadth first or depth first traversal. 

5. Design and implement an algorithm to execute depth first traversal of a graph represented 
by its incidence matrix. 

6. Design and implement an algorithm to obtain an Eulerian walk of an undirected graph in 
the event of such a walk being available. 

7. Implement the ADT for graphs in a programming language of your choice choosing a linked 
representation for the graphs. 


CHAPTER 


BINARY 


QA SEARCH TREES 
es AND AVL TREES 


10.1 Introduction 


10.2 Binary Search 
10.1 Trees: Definition 
i and Operations 


10.3 AVL Trees: 





Introduction 


In Chapter 8, the tree and binary tree data structures were 


discussed. Binary search trees and AVL trees are a category of Definition and 
binary trees which facilitate efficient retrievals. In this chapter the Operations 
definition of a binary search tree and its operations viz., retrieval, 10.4 Applications 


insertion and deletion are discussed. However, binary search trees 
can have their setbacks too, the rectification of which yields an AVL 
tree. The definition of the AVL search tree and the operations of 
retrieval, insertion and deletion on the tree are elaborated next. The 
application of the two data structures to the representation of 
symbol tables in compiler design have been detailed last. 





Binary Search Trees: Definition and Operations 


Definition 
A binary search tree T may be an empty binary tree. If non-empty, then for a set S, T is a labeled 
binary tree in which each node u is labeled by an element or key e(u) e S such that 
(i) for each node u in the left subtree of v, e(u) < e(v) 
(ii) for each node u in the right subtree of v, e(u) > e(v) 
(iii) for each element a €e S there is exactly one node u such that e(u) = a. 
In other words, a binary search tree T satisfies the following norms: 
(i) all keys of the binary search tree must be distinct 
(ii) all keys in the left subtree of T are less than the root element 
(iii) all keys in the right subtree of T are greater than the root element and 
(iv) the left and right subtrees of T are also binary search trees. 
Figure 10.1 illustrates an empty binary search tree and a non empty binary search tree defined 
for the set S ={G, M, B, E, K, I, Q, Z}. It needs to be emphasized here that for a given set S more 
than one binary search tree can be constructed. 





The McGraw-Hill Companies 


Binary Search Trees and AVL Trees 219 


W 





(a) Empty binary search tree (b) A non empty binary search 
tree for S= {G, M, B, E, K, I, O, Z} 


Fig. 10.1 Example binary search trees 


The inorder traversal of a binary search tree T yields the elements of the associated set S in the 
ascending order. If S = {a,,1= 1, 2, ...n}, then the inorder traversal of the binary search tree yields 
the elements in its ascending sequence, for example a, < a, < a, < ... a,. Thus, the inorder traversal 
of the binary search tree shown in Fig. 10.1 results in {B, E, G, I, K, M, Q, Z} which are the 
elements of S in the ascending order. 


Representation of a binary search tree 


A binary search tree is commonly represented using a linked representation in the same way as 
that of a binary tree (Sec. 8.5). The node structure and the linked representation of the binary 
search tree shown in Fig. 10.1, is illustrated in Fig. 10.2. However, the null pointers of the nodes 
may be emphatically represented using fictitious nodes called external nodes. The external nodes 
labeled as e; are shown as solid circles in Fig. 10.2. Thus a linked representation of a binary search 
tree is viewed as a bundle of external nodes which represent the null pointers and internal nodes 
which represent the keys. Such a binary tree is referred to as an extended binary tree. Obviously, 
the number of external nodes in a binary search tree comprising n internal nodes is n+1. The path 
from the root to an external node is called as an external path. 


LCHILD RCHILD 


| | | | 


DATA 
(a) Node structure of a binary search tree 


Le NG | og 
„IBIN YIM] Ņ 
e. > Internal node 


e 7 
PEN ER eN 
e] €2 e5 &6 
ARAN 
e3 e4 e7  ege--> External node 


(b) Linked representation of the binary search tree of Fig. 10.1(b) 
Fig. 10.2 Linked representation of a binary search tree 


The McGraw-Hill Companies 


220 Data Structures and Algorithms 


Retrieval from a binary search tree 


Let T be a binary search tree. To retrieve a key u from T, u is first compared with the root key 
r of T. If u = r then the search is done. If u < r then the search begins at the left subtree of T. If 
u >r then the search begins at the right subtree of T. The search is repeated recursively in the left 
and right sub-subtrees with u compared against the respective root keys, until the key u is either 
found or not found. If the key is found the search is termed successful and if not found, is termed 
unsuccessful. 

While all successful searches terminate at the appropriate internal nodes in the binary search 
tree, all unsuccessful searches terminate only at the external nodes in the appropriate portion of 
the binary search tree. Hence external nodes are also referred to as failure nodes. Thus if the 
inorder traversal of a binary search tree yields the keys in the sequence a, < a, < a, < ... a, then 
the failure nodes ey €y, €5, €z, ...e,, are all equivalence classes which represent cases of unsuccessful 
searches on the binary search tree. While eù traps all unsuccessful searches of keys that are less 
than 4, e} traps those that are greater than and less than a, and so on. In general, e; traps all keys 
between a; and a, which are unsuccessfully searched. For example, in Fig. 10.2(b), all keys less 
than B which result in unsuccessful searches terminate at the external node and those which are 
ereater than Q but less than Z terminate at the external node and so on. 

Algorithm 10.1 illustrates the procedure to retrieve the location LOC of the node containing 
the element ITEM, from a binary search tree T. 





Algorithm 10.1: Procedure to retrieve ITEM from a binary search tree T 


procedure FIND BST(T, ITEM, LOC) 
/* LOC is the address of the node containing ITEM which 
is to be retrieved from the binary search tree T. In case 
of unsuccessful search the procedure prints the message 
EIS glove ord and © tee bucigke = LOE aE = NUDES = 


if 7 = NIL then {print (“ binary search tree T is empty”); 
exit; } /* exit procedure*/ 
else 
hOEG = si; 
while (LOC # NIL) do 
case 
STEM = "DATA CLOG) <4 returm (LOC) > TTEMM TOUnd In node LOC 
JM =<) DATA OC) =. LOC )—— LCHILD( LOC), ~ eea a a lee Db ae 
TENE DEN TO ETO E NCL LD TOC, ArSeN one ouer ea y 
endcase 
endwhile 


If (LOC=NIL) then {print(“ITEM not found”); return (LOC) } 
J á unsuccessful search*/ 
end FIND BST 





Why are binary search tree retrievals more efficient than sequential list 
retrievals? 
For a list of n elements stored as a sequential list, the worst case time complexity of searching an 


element both in the case of successful search or unsuccessful search is O(n). This is so since in the 
worst case, the search key needs to be compared with every element of the list. However, in the 





The McGraw-Hill Companies 


Binary Search Trees and AVL Trees 221 


case of a binary search tree as is evident in Algorithm 10.1, searching for a given key k results in 
discounting half the binary search tree at every stage of its comparison with a node on its path 
from the root downwards. The best case time complexity for the retrieval operation is therefore 
O(1) when the search key k is found in the root itself. The worst case occurs when the search key 
is found in one of the leaf nodes whose level is equal to the height h of the binary search tree. 
The time complexity of the search is then given by O(h). In some cases binary search trees may 
erow to heights that equal n, the number of elements in the associated set, thereby increasing the 
time complexity of a retrieval operation to O(n) in the worst 
case (see Sec. 10.2). However, on an average assuming 
random insertions/deletions, we obtain the height h of the 
binary search tree to be O(log n) yielding a time complexity of 
O(log n) for a retrieval operation. 


Example 10.1 Consider the set S = {416, 891, 456, 765, 
111, 654, 345, 256, 333} whose associated binary search tree T 
is shown in Fig. 10.3. 

Let us retrieve the keys 333 and 777 from the binary search 
tree. Tables 10.1 and 10.2 show the trace of the Algorithm 
FIND for the retrieval of the two keys respectively. Here #(n) 
where n e S indicates the location (address) of the node 
containing the key n. While retrieval of 333 yields a successful 
search terminating at node #(333), retrieval of 777 results in 
an unsuccessful search terminating at the appropriate 
external node. Fig. 10.3 A binary search tree 





€? C3 C6 e7 


Table 10.1 Trace of Algorithm 10.1 for the retrieval of ITEM=333 


< 
ITEM = DATA(LOC) ? Updated Loc 
> 


Initially 333 < ANG LOC = LCHILD(#(416)) = #(111) 
LOC = 

LOC 333 > 111 LOC RCHILD (# (111) ) # (345) 
LOC 333 < 345 LOC LCHILD (# (345) ) # (256) 
LOC 333 > 256 LOC = RCHILD(#(256)) ae 
LOC 333 = 333 RETURN (# (333) ) 

Element found and node returned 





< 


ITEM DATA (LOC) ? 

> 
Initially Loc = 777 > 416 LOC # (891) 
LOC = (891) 7/7 < 891 LOC I # (456) 
LOC (456) T/T 456 LOC # (765) 
LOC (765) ZII > 705 LOC (765)) NIL 


LOC Element not found RETURN (NIL) 














The McGraw-Hill Companies 


222 Data Structures and Algorithms 


Insertion into a binary search tree 


The insertion of a key into a binary search tree is similar to the retrieval operation. The insertion 
of a key u initially proceeds as if it were trying to retrieve the key from the binary search tree, 
but on reaching the null pointer (failure node) which it is sure to encounter since key u is not 
present in the tree, a new node containing the key u is inserted at that position. 


Example 10.2 Let us insert keys 701 and 332 into the binary search tree T associated with 
set S ={ 416, 891, 456, 765, 111, 654, 345, 256, 333}, shown in Fig. 10.3. Figure 10.4 (a) shows the 
insertion of 701. Note how the operation moves down the tree in the path shown and when it 
encounters a failure node e,, the key 701 is inserted as the right child of node containing 654. 
Again the insertion of 332 which follows a similar procedure is illustrated in Fig. 10.4(b). 

The algorithm for the insert procedure is only a minor modification of Algorithm 10.1. The 
time complexity of an insert operation is also O(log n). 


Deletion from a binary search tree 


The deletion of a key from the binary search tree is comparatively not as straight as the insertion 
operation. We first search for the node containing the key by undertaking a retrieval operation. 
But once the node is identified, the following cases are tested before the node containing the key 
u is appropriately deleted from the binary search tree T: 

(i) key u is a leaf node 

(ii) key u has a lone subtree (left subtree or right subtree only) 
(iii) key u has both left subtree and right subtree 


Case (i) If the key u to be deleted is a leaf node then the deletion is trivial since the appropriate 
link field of the parent node of key u only needs to be set as NIL. Figure 10.5(a) illustrates this 
case. 


Case (ii) If the key u to be deleted has either a left subtree or a right subtree (but not both) then 
the link of the parent node of u is set to point to the appropriate subtree. Figure 10.5(b) illustrates 
the case. 


Case (iii) If the key u to be deleted has both a left subtree and a right subtree, then the problem 
is complicated. In this case since the right subtree comprises keys that are greater than u, the 
parent node of key u is now set to point to the right subtree of u. Now where do we accommodate 
the left subtree of u? Since all the keys of the left subtree of u are less than that of the right subtree 
of u, we move as far left as possible in the right subtree of u until an empty left subtree is found 
and link the left subtree of u at that position. Figure 10.5(c ) illustrates the case. 

The other methods of deletion in this case include replacing the key u with either the largest 
key | occurring in the left subtree of u or the smallest key s in the right subtree of u. It is 
guaranteed that / or u will turn out to be a node with either empty subtrees or any one non empty 
subtree. After replacing u with / or s as the case may be, the nodes carrying l or s are deleted from 
the tree using the appropriate procedure [Case (i) or Case (ii)]. 


Example 10.3 Delete keys 333, 891 and 416 in the order given, from the binary search tree 
T associated with set S = {416, 891, 456, 765, 111, 654, 345, 256, 333} shown in Fig. 10.3. 





The McGraw-Hill Companies 
Binary Search Trees and AVL Trees 223 
Binary search tree Binary search tree 
F l : T , l 
(410) before insertion 416 after insertion 
Insert 701 AN 
D 9) © (691 
d @ @s Wf @ @ Na 


5 e4 es 6: 256 e4 @5 
€] eg €] 3 5 


Element 701 
«~ inserted here 


(a) Insertion of 701 


D o) m G D 
aaa 
ad G E Ya : Gs) Nor 


Q @ \ aS 


Element 332 
(701) inserted here ~» (701) 
€) C3 C6 e4 e7 


Ey eg €72 e3 eg €9 
(b) Insertion of 332 


Fig. 10.4 /nsertion of elements 701 and 332 into the binary search tree shown in Fig. 10.3 


Deletion of 333, a leaf node, illustrates case(i). The RCHILD link of node #(256) is set to NIL. 


Figure 10.6(a) shows the deletion. 


Deletion of 891, a node with a single subtree (left subtree), illustrates case(ii). In this case the 
RCHILD link of node #(416) is set to point to node #(456). Figure 10.6(b) illustrates the deletion. 


The McGraw-Hill Companies 





224. Data Structures and Algorithms 


Before deletion After deletion 


\ \ 
\ \ 
\ 
(x) Delete u 
=> 
# (w) Link of 
x set to NIL 


(a) Deletion ofa leaf node 







Link of x points 
to the left 
subtree of u 


Delete u 






Left subtree 
of u F 





Delete u Link of x 
—- points to the 
right subtree 
Right = AN A ofu 
subtree 
ofu IL Á . 
subtree s S/N ÆA Left subtree of 
of u u joins at the 
ML UR left most empty 


subtree of up 


(c) Deletion of a node with both left and right subtrees 


Fig. 10.5 Deletion of a key from a binary search tree 


Lastly, the deletion of 416, the root node with both the left and right subtrees intact, results in 
node #(456) taking over as the root. However, the left subtree of the root viz., the subtree with 
node #(111) as the root, attaches itself as far left of the right subtree of node #(416). It therefore 
attaches itself to the LCHILD of node #(456). Figure 10.6(c ) illustrates this case. 

Algorithm 10.2 illustrates the deletion procedure on a binary search tree given NODE_U, the 
node to be deleted and NODE_X its parent. For simplicity, the procedure illustrates only the 
deletion operation for all non empty nodes NODE_U other than the root. A general procedure 
to delete any ITEM from a binary search tree T can be easily attempted (Programming 
Assignment 1 (Chapter 10)). The time complexity of the delete operation on a binary search tree 
is O(log n). 





The McGraw-Hill Companies 


Binary Search Trees and AVL Trees 


229 


Algorithm 10.2: Procedure to delete a node NODE_U from a binary search tree given its 


parent node NODE_X 
procedure DELETE (NODE _U, NODE X) 


Ve NODE TOP LS EINE? CIOS alin E IS Cla aise! neon e 
Cna  SEEOCI Ines oC INOIDIN, OC VG a OOS Sine pdaaiele 


NODE ORMO Joe jelive 7 ST \ Clovis! ie Seine eslousic VON Jol. 
Procedure DELETE is applicable for deletion of all non 
SMO MOSS Cielieie weites Vere WOOr (ses INOIDIE E INTL, eusie! 
NODE X # NIL */ 


case 


CEILI NODER E HATTE (NODE TON Spite O Ui shies a) = Weeue “iaoicles 


56 Pench D (NODES) ous SEC ED (NODES C On iin SEd Oon 
Wve Nee NOI Us, clas en one Cla el ore ieirie  elauak Lel Our 
NODEE @ Gesmececrively, 


EWI RETURN (TODERO E E a Moc EO ielae 
Available space Jist*/ 


: LCHILD(NODE_U)<> NIL and RCHILD(NODE U)<> NIL: /* NODE U has both left 
and right subtrees*/ 

jo" eal a “Cir WOE IU eO  INICIDIE, 7 / 

Sei INC UMED (INOW Oe) eis ICs ILI (INO T MtO 

PCRTEDINODET ONDO -e worm Wine thc l an) ODE mo sal sama ae 

ioe le “Omailliel Cre veinies Cloke! oir SV INOIWIE 2 = ee SpoiS ie vis yy, 
a SilSiiis 9 Silo} cieele a Gis a a 
the right SHuLeS Cll INODE U 2G T ooe anole / 


TEME- E ROCHTED(NODE FO) 
while (LCHILD(TEMP) <> NIL) do 
TEMP=LCHILD (TEMP) ; 


endwhile 
CHILD (LEMP) j.—— Leh Line (NODE TO) ; 
call RETURN (NODE U); 
SCH ILD (NODERU NIL and EO ATED NORE) = NIL: NODE TU has only left 


Sub Ereet, 
TEMP=LCHILD (NODE JU) ; 


Se Ee a crn (NODE. Oe ATO ATED (NODE Et OF TEMES Daed 
Si WinSeliSic NOE Oks sels ieee ela Wel Cie NSimies tein Mel 
One NODE nue spe ervey, 


call RETURN (NODE U); 


TOCHTE (NODE) NILI and shen ib (NODE Ui TNI “NODE SU Shas vomly 21 cine 


Silo~@reer / 
TEMP=RCHILD (NODE MU) $ 


Sebo he MED NODE 6) Or A EOCHTECDO(NODET toes TEME MD Daed 
onhe hor NOI S Us ViclmS ™ a e “ela Leh one diene reins. 16! 
Obey (Obie Ge ise spc eminyein 7, 

call RETURN (NODE U); 


endcase 
end DELETE 





The McGraw-Hill Companies 


226 Data Structures and Algorithms 


Binary search tree before deletion Binary search tree after deletion 


Delete 333 
u 





(a) Delete 333 





Delete 891 
—— 


(b) Delete 891 
N@ 
aso 


Delete 416 
pram A 


(c) Delete 416 





Fig. 10.6 Deletion of keys 333, 891 and 416 from the binary search tree shown in Fig. 10.3 





The McGraw Hill Companies 


Binary Search Trees and AVL Trees 227 


Drawbacks of a binary search tree 


Though binary search trees in comparison to sequential lists report a better performance of 
O(log n) time complexity for their insert, delete and retrieval operations, they are not without 
their setbacks. As pointed out in Sec. 10.2, there are instances where binary search trees may 
erow to heights that equal n, the number of elements to be represented as the tree, thereby 
deteriorating their performance. This may occur due to a sequence of insert operations or delete 
operations. Examples 10.4 and 10.5 illustrate instances when the height of a binary search tree 
reaches n. 


Example 10.4 Let us construct binary search trees for the sets S, = {A, B, C, D, E, F, G, H, 
I, J, K, L, M} and S, = {M, L, K, J, L, H, G, F, E, D, C, B, A}. It can be seen that while the elements 
of S, are in the ascending order of the alphabetical sequence, those in S, are in the descending 
order of the sequence. The respective binary search trees are shown in Fig. 10.7. 





Fig. 10.7 Binary search trees for the sets S, = {A, B, C, D, E, F G, H, I, J, K, L, M} and 
S,={M,L, K, I, H, G, F, E, D, C, B, A} 


Observe that the two binary search trees are right skewed and left skewed respectively. In such 
a case, the height h of the binary search tree is equal to n and hence a search operation on these 
binary search trees in the worst case would yield O(n) time complexity. 


The McGraw-Hill Companies 


228 Data Structures and Algorithms 


Example 10.5 Consider a skeletal binary search tree shown in Fig. 10.8 (a). Deletion of node 
y in the tree yields the one shown in Fig. 10.8(b). Here again it may be seen that the binary search 
tree after deletion has yielded a left skewed binary tree once again resulting in O(n) time complexity 
in the event of a search operation. 


Delete y 
i 





(a) Binary search tree before deletion (b) Binary search tree after deletion 


Fig. 10.8 Deletion from a binary search tree resulting in a skewed binary tree 


It is clear from the above examples that if the height of the binary search tree is left unchecked 
for it can result in skewed binary trees deteriorating their performance. In other words it is 
essential that the binary search trees are maintained so as to have a balanced height. Trees whose 
height in the worst case yields O(log n ) are known as balanced trees. AVL trees are one such trees 
and is discussed in Sec. 10.3. 


AVL Trees: Definition and Operations 





In Sec. 10.2 it was pointed out how binary search trees can reach heights equal to n, the number 
of elements in the tree, thereby deteriorating its performance. To eliminate this drawback it is 
essential that during an insert or delete operation which can affect the structure of the tree and 
hence the height of the tree, it is ensured that the binary search tree remains of balanced height. 
In other words, there needs to be a mechanism to ensure that an insert or delete operation does 
not turn the tree into a skewed one. As mentioned earlier, trees whose height in the worst case 
turns out to be O(log n) are known as balanced trees or height balanced trees. One such balanced 
tree viz., AVL trees are discussed in this section. AVL trees were proposed by Adelson-Velskii and 
Landis in 1962. 





The McGraw-Hill Companies 


Binary Search Trees and AVL Trees 229 


Definition 

An empty binary tree is an AVL tree. If non empty, the binary tree T is an AVL tree if (i) T; and 
Tp, the left and right subtrees of T are also AVL trees and (ii) |/(T,) — (Tp)! < 1, where h(T,) and 
h(Tp) are the heights of the left subtree and right subtree of T respectively. 

For a node u, bf(u) = (h(u;) — h(up)) where h(u,;) and h(up) are the heights of the left and right 
subtrees of the node u respectively, is known as the balance factor (bf ) of the node u. In an AVL 
tree therefore, every node u has a balance factor bf(u) which may be either 0 or +1 or —1. 

A binary search tree T which is an AVL tree is referred to as an AVL search tree. This section 
elaborates on the operations of insert, delete and retrieval performed on AVL search trees. 

Figure 10.9 illustrates examples of AVL trees and AVL search trees. The balance factor of each 
of the nodes is indicated by the side of the node within parentheses. Note how the balance factors 
of the nodes in the AVL trees are either 0 or +1 or —1. 


ki 


(a) an empty AVL tree 


Ch) (+2) 
(m1) T 
(D) (0) 


(c) anon AVL tree 








(e)a non AVL search tree 


Fig. 10.9 Examples of AVL trees and non AVL trees 


AVL trees and AVL search trees just like binary trees or binary search trees may be represented 
using a linked representation adopting the same node structure (see Sec. 8.5 and Sec. 10.2). 
However to facilitate efficient rendering of insert and delete procedures, a field termed BF may 
be included in the node structure to record the balance factor of the specific node. 





The McGraw-Hill Companies 


230 Data Structures and Algorithms 


Retrieval from an AVL search tree 


The retrieval of a key from an AVL search tree is in no way different from the retrieval operation 
on a binary search tree. Algorithm 10.1 illustrating the find operation on a binary search tree T 
may be utilized for retrieval of an element from an AVL search tree as well. However, since the 
height of the AVL search tree of n elements is O(log n), the time complexity of the find procedure 
when applied on AVL search trees does not exceed O(log n). 


Insertion into an AVL search tree 


The insertion of an element u into an AVL search tree T proceeds exactly as one would to insert 
u in a binary search tree. However, if after insertion the balance factors of any of the nodes turns 
out to be anything other than 0 or +1 or —1, then the tree is said to be unbalanced. To balance the 
tree we undertake what are called rotations. Rotations are mechanisms which shift some of the 
subtrees of the unbalanced tree to obtain a balanced tree. 

With regard to rotations there are some important observations which are helpful in the 
implementation of the operations on AVL trees. For the initiation of rotations, it is required that 
the balance factors of all nodes in the unbalanced tree are limited to —2, —1, 0, 1, and +2. Also the 
rotation is initiated with respect to an ancestor node A that is closest to the newly inserted node 
u and whose balance factor is either +2 or —2. If a node w after insertion of node u reports a 
balance factor of bf(w) = +2 or -2 respectively, then its balance factor before insertion should have 
been +1 or —1 respectively. The insertion of a node can only change the balance factors of those 
nodes on the path from the root to the inserted node. If the closest ancestor node A of the inserted 
node u has a balance factor bf(A) = +2 or —2, then prior to insertion the balance factors of all nodes 
on the path from A to u must have been 0. In fact these observations are vital to determining the 
closest ancestor A after insertion of u. 

The rotations which are of four different types are listed below. The classification is based on 
the position of the inserted node u with respect to the ancestor node A which is closest to the 
node u and reports a balance factor of —2 or +2. 

(i) LL rotation—node u is inserted in the left subtree (L) of left subtree (L) of A 
(ii) LR rotation—node u is inserted in the right subtree (R) of left subtree (L) of A 
(iii) RR rotation—node u is inserted in the right subtree (R) of right subtree (R) of A 
(iv) RL rotation—node u is inserted in the left subtree (L) of right subtree (R) of A 
Each of the four classes of rotations are illustrated with examples. 


LL rotation 


Figure 10.10 illustrates a generic representation of LL type imbalance and the corresponding 
rotation that is undertaken to set right the imbalance. After insertion of node u, the closest 
ancestor node of node u, viz., node A, reporting an imbalance ( bf (A) = +2) is first found out. For 
simplicity of discussion, the generic tree shown in Fig. 10.10(a) has been so chosen to have the 
ancestor node A occurring at the root. In reality the ancestor node A may occur anywhere down 
the tree. Now with reference to the ancestor node A, we find that the node u has been inserted 
in the left subtree (L) of left subtree (L) of A. This implies there is an LL type of imbalance and 
to balance the tree an LL rotation is to be called for. The AVL tree before insertion of u 
(Fig. 10.10(a)), the unbalanced tree after insertion of u (Fig. 10.10(b)) and the balanced tree after 
the LL rotation (Fig. 10.10(c)) have been illustrated. 


The McGraw-Hill Companies 


Binary Search Trees and AVL Trees 231 


AVL search tree balanced 
after LL rotation 


Insert u 
(into Bz) 





(a) Balanced AVL Node u found inserted in the (c) AVL search tree 
search tree before left subtree of left subtree of A balanced after 
insertion LL rotation 


(b) AVL search tree unbalanced 
after insertion 


Fig. 10.10 Generic representation of an LL rotation 


Here u is found inserted in the left subtree of B, viz., B; where B is in the left subtree of A. 
We assume the heights of the generic subtrees Ap, B; and Bp to be h. Observe the imbalance in 
the balance factor of A after insertion of u. bf(A) which was +1 before insertion of u changes to 
+2 after insertion. To balance the tree, the LL rotation pushes B up as the root of the AVL tree 
which results in node A slumping downwards to its left along with its right subtree Ap. Now the 
tree is rearranged by shifting the right subtree of B, viz., Bp to join A as its left subtree, leaving 
B; (holding the inserted node u) undisturbed as the left subtree of B. 


Example 10.5 Consider the AVL search tree shown in Fig. 10.11(a). Let us insert C into the 
AVL search tree. To facilitate ease of understanding, the notations employed in the generic tree 
of Fig. 10.10 have been mapped to the given tree. Note how C finds itself inserted in the left 
subtree of left subtree of M, the closest ancestor node of C that shows bf(M) = +2 after insertion. 

The LL rotation pushes F up, to become the root of the tree and shifts the subtree with node 
K which was originally the right subtree of F, to the left subtree of M. 


B (0) 


A S (+2) 





Insert C 
(0) 
AR 
u 
(a) Balanced AVL tree (b) Unbalanced AVL search (c) Balanced AVL search 
before insertion tree after insertion of C tree after LL rotation 


Fig. 10.11 An example of LL rotation 





The McGraw-Hill Companies 


pie P Data Structures and Algorithms 


LR rotation 


Figure 10.12 illustrates the generic representation of an LR type of imbalance and the 
corresponding rotation that is undertaken to set right the imbalance. Here the node u on insertion 
finds A to be its closest ancestor node that is unbalanced and with reference to node A is inserted 
in the right subtree of left subtree of A. This therefore is an LR type of imbalance and calls for 
LR rotation to balance the tree. The AVL tree before insertion of u (Fig. 10.12 (a)), the unbalanced 
tree after insertion of u (Fig. 10.12(b)) and the balanced tree after the LR rotation (Fig. 10.12(c)) 
have been illustrated. 

Here u finds itself inserted in the right subtree of left subtree of A, the closest ancestor node. 
The heights of the subtrees Ap, B;, Bg, Cr and Cp are as shown in the figure. Let us suppose u 
is found in C; the left subtree of C. The procedure is no way different if u is found in Cp the right 
subtree of C. The LR rotation rearranges the tree by first shifting C to the root node. Then the left 
subtree C; of C is shifted to the right subtree of B and Cp the right subtree of C is shifted to the 
left subtree of A. The rearranged tree is balanced. In the case of LR rotation, the following 
observations hold: 

If BF(C) = 0 after insertion of new node then BF(A)=BF(B)=0 after rotation 

If BF(C) = —1 after insertion of new node then BF(A)= 0, BF(B)=+1 after rotation 

If BF(C) = +1 after insertion of new node then BF(A)=-1, BF(B)=0 after rotation 


(+1) Insert A (+2) 
(into Cz) LR rotation 
== —=—»y>' 





(a) Balanced AVL search 
tree before insertion 


(c) AVL search tree balanced 
after LR rotation 


Node u found inserted 


in the right subtree of 
left subtree of A 
(b) AVL search tree unbalanced 
after insertion 


Fig. 10.12 Generic representation of an LR rotation 


Example 10.6 Consider the AVL search tree shown in Fig. 10.13. The subtrees C, and Cp of 
node C in the generic representation shown in Fig. 10.12 are mapped to empty subtrees in this 
tree. In other words, the node labeled L has empty left and right subtrees. Let us insert H into 
the AVL search tree. Note how H gets inserted into the right subtree of left subtree of S, the 
closest ancestor node of H that shows Df(S)= +2. The LR rotation rearranges the tree by first 
pushing node L to be the root. As a result, node S slumps to its right along with its right subtree 
comprising the element W. Thereafter, the original left subtree of L holding the newly inserted 
node H is attached to F as its right subtree. In the absence of a right subtree for L (which was so 
before rotation), only an empty tree is attached as the left subtree of S. 





The McGraw-Hill Companies 


Binary Search Trees and AVL Trees 233 





Insert H LR rotation 
Cr) 
Cz and Cp are 
empty subtrees 
(a) Balanced AVL search (b) Unbalanced AVL search (c) Balanced AVL search 
tree before insertion tree after insertion of H tree after LL rotation 


Fig. 10.13 An example of LR rotation 


RR rotation 


The RR rotation is symmetric to the LL rotation. Figure 10.14 illustrates the generic representation 
of the RR rotation scheme. Observe how node u finds itself inserted in the right subtree of right 
subtree of A, the closest ancestor node that is unbalanced and the rotation is merely a mirror 
image of the LL rotation scheme. 


Insert u 
(into Bp) 





(c) AVL search tree balanced 
after RR rotation 


(a) Balanced AVL search 
tree before insertion 


Node u is found inserted in the 
right subtree of right subtree of A 


(b) AVL search tree unbalanced 
after insertion 


Fig. 10.14 Generic representation of an RR rotation 


Example 10.6 Consider the AVL search tree shown in Fig. 10.15. The insertion of Z calls for 
an RR rotation. The unbalanced AVL search tree and the balanced tree after RR rotation have been 
shown in Figs 10.15 and 10.15 respectively. 


RL rotation 


RL rotation is symmetric to LR rotation. Figure 10.16 illustrates the generic representation of the 
RL rotation scheme. Here node u finds itself inserted in the left subtree of right subtree of node 
A which is the closest ancestor node that is unbalanced. Note how the RL rotation is the mirror 
image of the LR rotation scheme. As pointed out for the LR rotation scheme, the rotation 
procedure for RL remains the same irrespective of u being inserted in C; or Cp, the left subtree 
and right subtree of C respectively. 





The McGraw-Hill Companies 


234 Data Structures and Algorithms 


Insert Z 





(a) Balanced AVL search (b) Unbalanced AVL search (c) Balanced AVL search 
tree before insertion tree after insertion of Z tree after RR rotation 


Fig. 10.15 An example of RR rotation 


Insert u 
(into Cp) 
p 





(a) AVL search tree balanced Node u found inserted in the (c) AVL search tree balanced 
before insertion left subtree of right subtree of A after RL rotation 
(b) AVL search tree unbalanced 
after insertion 


Fig. 10.16 Generic representation of an RL rotation 


Example 10.7 Consider the AVL search tree shown in Fig. 10.17(a). The insertion of M calls 
for an RL rotation. The unbalanced AVL search tree and the balanced tree after RL rotation have 
been shown in Figs 10.17(b) and 10.17(c) respectively. 





Insert M 
Br 
C; and Cp are 
empty trees 
(a) Balanced AVL (b) Unbalanced AVL (c) Balanced AVL 
search tree before search tree after search tree after 
insertion insertion of M RL rotation 


Fig. 10.17 An example of RL rotation 





The McGraw-Hill Companies 


Binary Search Trees and AVL Trees 235 


In the above classes of rotations, LL and RR are called as single rotations and LR and RL are 
called as double rotations. An LR rotation is a combination of RR rotation followed by an LL 
rotation and RL rotation is a combination of LL rotation followed by an RK rotation. 

Algorithm 10.3 illustrates a skeletal procedure to insert an element into an AVL search tree. 
The procedure initially tries to identify the most recent ancestor node A of the inserted element 
whose bf(A) = +1. If no such node A is found then all nodes in the path from the root to the newly 
inserted node have a balance factor of 0 at the time of insertion and hence the tree cannot go 
unbalanced due to the insertion. In such a case we only update the balance factors of the nodes 
in the path from the root to the newly inserted node by updating the BF value of the node to +1 
if ITEM is inserted in its left subtree and to —1 if it is inserted in its right subtree. 


Algorithm 10.3: Skeletal procedure to insert an ITEM into an AVL search tree T 


procedure INSERT(T, ITEM) 
/*Steps to insert an ITEM into an AVL search tree T. 
The node structure comprises the fields LCHILD, DATA, 
BE ang —~KRCHILD representing lolt ONda link, data, 
balance factor and rigi child link 7 


call GETNODE (X); /* get ready new node X containing ITEM*/ 
DATA (X) =ITEM, 
LCHILD (X) =RCHILD(X)=NIL and BF(X)=0; 

/* AVL search tree T is empty*/ 


if. (T=NIL) “then {Set M to x; 
exit; 
} 


Vo WAVE sear h ELEC T iS non "eMmory vena TIEM Vis disen. 
from other elements in T */ 


Find node P where ITEM is to be inserted as either the left child or right 
child of Piby following a pathi from the root onwards.: Also, while 
traversing down the tree in search of the point of insertion of ITEM, take 
note of the most recent ancestor node A whose BF(A)= t1; 


Insert node X carrying ITEM. as the left or right- child of node P; 


[LE no ancestor node A as Tound the Dalance faceors ol 
all nodes on the patch from the rOOR TO the node 
containing ITEM is 0. The tree will therefore remain 
balanced even after insertion of ITEM. Merely update the 
BF fields of all the nodes on the path from the root to 
node P after insertion of ITEM and exit*/ 


if (node A not found) then {TEMP = T; 


/* update BF field of node to +1 if ITEM is inserted in 
MBiey a e a | elglol wie. gay hi Gag SiSieiceiel 2° al a logon 
subtree*/ 


while ( TEMP <> X) do 
siz (DATA(X) > DATA (TEME) J 








The McGraw-Hill Companies 


236 Data Structures and Algorithms 





1 BE (TEMP) —— > 
TEMP= RCHILD( TEMP) ; 
} 
else {BF(TEMP) = +1; 
TEMP= LCHILD (TEMP) ; 
} 
endwhile 
exit; 
[* Sit aod A exot and BFA) i switch TTEN 
inserted in the right subtree of A or BF(A)= 
-1 with ITEM inserted in the left subtree of 
A, then set BF(A)=0. Update the balance factors 
of all nodes in the path from node A to the 
inserted node xX*/ 
if (node A found) 
then 
{ if (BF (A)= +1 and ITEM was inserted in the right subtree of A) or 
(BF (A)= -1l and ITEM was inserted in the left subtree of A) 
then {BF (A)=0; 
Update the balance factors of all nodes in the path from node A to 
the inserted node x; 
exit; 


} 


else 


/* AVL search tree T is unbalanced. Classify the imbalance 
and perform the appropriate rotations*/ 


Identify the type of imbalance and apply the appropriate rotations. 
Update the balance factors of the nodes as required by the rotation 
scheme as well as reset the LCHILD and RCHILD links of the 
appropriate nodes. 


} y 
end INSERT 


If node A exists and bf(A) = +1 and the insertion is done in the right subtree of A or if Df(A) 
= —] and the insertion is done in the left subtree of A then we set bf(A)=0. Also, we update the 
balance factors of all nodes in the path from the node A to the newly inserted node. In all other 
cases, the type of imbalance is identified and the appropriate rotations are carried out. This may 
call for updating the balance factors of the involved nodes as well as resetting the link fields of 
the relevant nodes after identifying the appropriate B, C, Aj, Ap, B;, Bp, C; and Cp relevant to the 
rotation scheme. 

The time complexity of the insert operation is O(height) = O(log n). 





Deletion from an AVL search tree 


To delete an element from an AVL search tree we discuss the operation based on whether the 





The McGraw-Hill Companies 


Binary Search Trees and AVL Trees 257 


node t carrying the element to be deleted is either a leaf node or one with a single non empty 
subtree or with two non empty subtrees. A delete operation just like an insert operation may also 
imbalance an AVL search tree. Just as LL/ LR/ RL/ RR rotations are called for to rebalance the tree 
after insertion, a delete operation also calls for rotations categorized as L and R. While the L 
category is further classified as L0,L1 and L — 1 rotations, the R category is further classified as 
RO, R1 and R — 1 rotations. The Classify rotations for deletion section of Algorithm 10.3 details 
the mode of classification of the L and R rotations. However, it needs to be remembered that not 
all deletions call for rotations. 

Considering the complex nature of the operation we present its skeletal work procedure in two 
algorithms viz., Algorithms 10.3 and 10.4. While Algorithm 10.3 illustrates the case when node 
t is a leaf node or one with a single non empty subtree, Algorithm 10.4 illustrates the case when 
node t is one with two non empty subtrees. For the invocation of the algorithms we assume that 
the node holding the ITEM which is to be deleted from the AVL search tree T, viz., node t has 
already been found. 


Algorithm 10.4: Skeletal procedure to delete an ITEM from a non empty AVL search tree 
T where node t in T holding ITEM is either a leaf node or one with a single 
subtree 


Procedure DELETE1(T, node _ t) 
/*Steps to delete an ITEM from a non-empty AVL search tree T. 
node t that holds ITEM and is either a leaf node or one with 
a single non empty subtree has been identified. The node 
scructure comprises Che ficilds  LCHIED, DATA; BF anrd “REELED 
tepresenting left chila link, data, balance Tactor and rignt 
child link respectively */ 


if node t iş a leaf node or a mode with a single child 
then 


{ Let node p be its parent node; 
Delete node t and reset the links of node p appropriately so as 
to either have a null link or to point to the lone child of node 
t as is the Case; 


} 
else call DELETE (T, node t); /* call procedure DELETE2(T, node t)*/ 


Update balance factors: 


Rule 1: With regard to mode p, 1: mode Cs deletion occurred in its right 
Subtree then sf (p) ancreases by l and 16 It occurred in its left 
subtree then bf(p) decreases by 1. 

Rule 2: If the new bf(p)=0 then the height of the tree is decreased by 
T and therefore this calls for updating the balance factors of 
its parent node and/or its ancestor nodes. 

Rule 3: If the new bf(p) = +1, then the height of the tree is the same 
as it was before deletion and therefore the balance factors of 
the ancestor nodes remains unchanged. 

Rule 4: If the new bf(p)= +2, then the node p is unbalanced and the 
appropriate rotations need to be called for. 





The McGraw-Hill Companies 


238 Data Structures and Algorithms 





Classify rotations for deletion: 


While propagating the balance factor updates from the node p upwards to 
the root node there may be nodes whose balance factors are updated to #2. 
Let A be the first such mode on the path from node p to the root. 


1f the deletion took place on the right of A 
then classify the rotation as R or else classify it as 1; 


For the R classification, if bf(A)=+2 then it should have been +1 before 
deletion and A should have a left subtree with root B. Based on bf(B) being 
either 0 or tl or -l Classify the R rotations further as RO RI and R- 
1 respectively/*See Sec. 10.3: R category rotations associated with the 
delete operation*/ 


For the L classification, if bf(A)=-2 then it should have been -1l before 
deletion and A should have a right subtree with root B. Based on bf (B) 
being either OTor rki or —larelacsity =the. rotations further tas Mc, ssn! 
and L-I respectively/*See Sec. 10.3: L category rotations associated with 
the delete operation*/ 

Perform rotation: 

Perform the appropriate rotations to balance the tree. y 


end DELETE1. 


Algorithm 10.5: Skeletal procedure to delete an ITEM from a non empty AVL search tree T 
where node t in T holding ITEM is one with both a left and a right subtree 
Procedure DELETE2(T, node _ t) 
/*Steps to delete an ITEM from a non-empty AVL search tree 
Di Rodeos Sielcing. T TEN van iwi Giie hese OO Gita elem Geran laid oie 
non empty subtree has been identified. The node structure 
comprises the fields LCHILD, DATA, BF and RCHILD representing 
left CPT AE n MeleNe ch. Vee heigl we T aC Tor eand T ane 
lou E T O 7 
if node t is one with a non-empty left and right subtree 
then 
{ 
Find the smallest key in the right subtree of node ¢t. This is 
obtained by moving down the RCHILD link of “node t to reach node 
u and traversing the LCHILD links of the left subtree of node u 
MIME Mine SS empty, 


Let node v be the last node reached while traversing the left 
subtree of node u down the left child nodes. 
Now, LCHILD (node v)=NIL. Let SUCC= DATA (node v); 


Replace DATA (node t) with SUCC; 


Delete node v using Procedure DELETE1(T, node v), since node v will 
either be a leaf node or one with a single subtree. 


} 
end DELETE2. T 


Rotation free deletion of a leaf node In the case of deletion of node t which is a leaf 
node, we physically delete the node and make the child link of its parent, viz., node p null. Now 








The McGraw-Hill Companies 


Binary Search Trees and AVL Trees 239 


update the balance factor of node p based on whether the deletion occurred to its right or left. 
If it had occurred in the right then we increase bf(node p) by 1 or else decrease Df(node p ) by 1. 
The new updated value of bf(node p) is now tested against Rules 1-4 of Algorithm 10.3 for 
updating the balance factors of its ancestor nodes. In the event of any imbalance, the appropriate 
rotations viz., LO/RO, L1/R1 or L-1/R-1 are called for rendering the tree balanced. 


Example 10.8 Consider the balanced AVL search tree shown in Fig. 10.18. Deletion of 30 is 
a case of deleting a leaf node. Figure 10.18(a) shows the balanced tree after deletion. Observe how 
the parent of node 30, viz., the node holding 25 resets its RCHILD link to NIL after the physical 
deletion of the node holding 30. Now how are the balance factors of the other nodes updated? 
Since the deletion of key 30 occurred to the right of key 25, bf(25) is increased by 1. Hence the 
new updated bf(25) is 0. As per Rule 2 outlined in Algorithm 10.3, this calls for the update of the 
balance factors of all ancestor nodes of key 25. Since the root (key 20) is the only ancestor node 


Balanced tree before deletion Balanced tree after deletion 
(+1) 


delete 30 





(b) Deletion ofa node with a single subtree 


delete 15 





(c) Deletion ofa node with both left and right subtree 
Fig. 10.18 Rotation free deletion of nodes in an AVL search tree 





The McGraw-Hill Companies 


240 Data Structures and Algorithms 


available and the deletion took place to its right, bf(20) is increased by 1 which yields the new 
value as bf(20)=+1. We now see that the tree is automatically balanced and in this case no rotations 
were called for. 


Rotation free deletion of a node having a single subtree In the case of deletion of 
node t with a single subtree, just as before, we reset the child link of the parent node, node p, to 
point to the child node of node t. The balance factors of node p and / or its ancestor nodes are 
updated using Rules 1—4 of Algorithm 10.3. 


Example 10.9 Deletion of key 25 illustrated in Fig. 10.18(b) is a case of deletion of a node 
with a single subtree. Observe how the RCHILD link of node 20 which is the root is reset to point 
to node 30, the right child of node 25. Since the deletion occurred to the right of node 20, bf(20) 
is updated to +1. Again the deletion does not call for any rotation to balance the tree. 


Rotation free deletion of a node having both the subtrees In the case of deletion 
of node t which has both its subtrees to be non empty, the deletion is a little more involved 
(Algorithm 10.4). We first replace DATA (node t) with the smallest key of the right subtree of 
node t or with the largest key of the left subtree of node t. Algorithm 10.4 illustrates replacement 
using the smallest key on the right subtree of node t. The smallest key of the right subtree of node 
t can be obtained by moving right and then moving deep left until the left child link is NIL. 
Similarly, moving left and then moving deep right until an empty right child link is seen will 
yield the largest element in the left subtree of node t. An important observation here is that the 
node representing the smallest value of the right subtree or the largest value of the left subtree 
will turn out to be either leaf nodes or nodes with a single subtree. Now we physically delete the 
node holding this replacement value using the procedure discussed for the deletion of a leaf node 
or a node with a single subtree (Algorithm 10.3). 


Example 10.10 Deletion of node 15 from the balanced AVL search tree shown in Fig. 10.18 
yields the balanced tree shown in Fig. 10.18(c). Here the value 15 in the respective node is replaced 
by 18, the largest value in the right subtree of node 15. Now we physically delete the leaf node 
holding 18 using the case discussed earlier. Observe how Df(18) (the balance factor of the node 
which earlier had the value 15 but now holds the replaced value 18) is updated to +1 and 
applying Rule 3 of Algorithm 10.4, the balance factors of no other ancestor nodes are changed. 
The example once again illustrates a case of rotation free deletion. 

As pointed out in Rule 4 of Algorithm 10.3, during the propagation of balance factor updates 
from the specific node to the root upwards, there could be cases of imbalance amongst the nodes. 
To set right the imbalance it is essential that the category of rotation viz., L or R is identified 
before sub classifying it further as LO/RO, L1/R1 or L-1/R-1. All the rotations are performed with 
regard to the first ancestor node A encountered on the path to the root node and whose Df(A)= 
+2. Each of the R category and L category of rotations are illustrated with examples. 


R category rotations associated with the delete operation 


RO rotation Figure 10.19 illustrates the generic representation of an RO rotation. Node t is to 
be deleted from a balanced tree with A shown as the root (for simplicity) and with A, as its right 
subtree. B is the root of A’s left subtree and has two subtrees B; and Bp. The heights of the 


The McGraw-Hill Companies 


Binary Search Trees and AVL Trees 241 
Balanced tree Unbalanced tree Balanced tree 
before deletion after deletion after deletion 
ae 
Delete 
node t R, rotation 
—_—S — 





Fig. 10.19 Generic representation of an RO rotation 


subtrees are as shown in the figure. Now the deletion of node t results in an imbalance with 
bf(A)=+2. Since deletion of node t occurred to the right of A and since A is the first node on the 
path to the root, the situation calls for an R rotation with respect to A. Again since bf(B)=0, the 
rebalancing needs to be brought about using an RO rotation only. Here, B pushes itself up to 
occupy A’s position pushing node A to its right along with Ap. While B retains B, as its left 
subtree, Bp is handed over to node A to function as its left subtree. Observe how the tree regains 
its balance. 


Example 10.11 Figure 10.20 illustrates the deletion of key 65 from the balanced tree shown. 
Since 65 occurred to the right of 55, bf(55) is updated to 0. This implies the balance factors of its 
ancestors nodes need to be updated. When we proceed to the node holding 50, bf(50) is updated 
to +2. This calls for an RO rotation with respect to node 50. The notations of the generic 
representation (Fig. 10.19) have been mapped to the given tree for ease of understanding. At the 
end of the rotation the tree is balanced with 30 as its root and the tree appropriately rearranged. 






Balanced tree Unbalanced tree Balanced tree 
before deletion after deletion after deletion 
A (+1) A (+2) 






Ro rotation 
oe 


Delete 65 
(—1} 





Fig. 10.20 Deletion of a node calling for RO rotation 


RI Rotation Figure 10.21 illustrates the generic representation of an R1 rotation. Deletion of 
node t occurs to the right of A the first ancestor node whose Df(A)=+2. But bf(B)=+1 classifies it 
further as R1 rotation. The rotation is similar to the RO rotation and yields a balanced tree. 





The McGraw: Hill Companies 


242 Data Structures and Algorithms 
Balanced tree Unbalanced tree Balanced tree 
before deletion after deletion after deletion 
+1) +2) 0) 
l Delete ( 
node t R1 rotation 
(+1) Ap j (+ 1) > B; (0) 









Fig. 10.21 Generic representation of an R1 rotation 


Example 10.12 Figure 10.22 shows the deletion of key 80 from the given balanced AVL 
search tree. The deletion occurs to the right of 76 and while updating the balance factors of the 
ancestor nodes yields bf(76)=+2 which is the first and only ancestor node reporting imbalance. R1 
rotation yields 60 as the root with the tree accordingly rearranged to balance it. 






Balanced tree Unbalanced tree Balanced tree 

before deletion after deletion after deletion 
A ee, Delete 80 A (2) RI rotation 
— m a 





Fig. 10.22 Deletion of a node calling for R1 rotation 


R-/ rotation The generic representation of an R-1 rotation is shown in Fig. 10.23. As in the 
other rotations deletion of node t results in the imbalance of the tree with regard to A and also 


Balanced tree Unbalanced tree Balanced tree 
before deletion after deletion after deletion 
Delete (0) 
node f 





Fig. 10.23 Generic representation of an R—1 rotation 


The McGraw-Hill Companies 


Binary Search Trees and AVL Trees 243 


leaves bf(B)= —1 calling for R-1 rotation. Here let C be the root of the right subtree of B and C; 
and C} its left and right subtrees respectively. During the rotation C elevates itself to become the 
root pushing A along with its right subtree A, to its right. The tree is now rearranged with C, 
as the right subtree of B and Cp as the left subtree of A. The tree automatically gets balanced. 
R-1 rotation is a case of double rotation where the rotation is once applied over B and then again 
over A. 


Example 10.13 Figure 10.24 illustrates the deletion of key 40 from the given balanced AVL 
search tree. Since 40 occurs to the left of the key 46, bf(46) is updated to 0 triggering an update 
of bf(35) which yields +2. Since bf(21)= —1, we resort to R-1 rotation. The rest of the steps in the 
rotation follow those shown in the generic representation (Fig. 10.23). 


Balanced tree Unbalanced tree Balanced tree 
before deletion after deletion after deletion 





Fig. 10.24 Deletion of a node calling for R—1 rotation 


L category rotations associated with the delete operation 


If the deletion of node t occurs to the left of A, the first ancestor node on the path to the root 
reporting bf(A)= —2, then the category of rotation to be applied is L. As in R rotations, based on 
bf(B)= +1, -1, or 0 the L rotation is further classified as L1, L-1 and LO respectively. The generic 
representations of the L0, L1 and L-1 rotations are shown in Fig. 10.25. An illustrative example 
for the L category of rotations is presented in Illustrative Problems I 10.9. 

Unlike insertion, to rebalance a tree after a deletion it may be that more than one rotation is 
required. In fact the number of rotations required is O(log n). It can be observed that there are 
similarities between the LL, LR, RL and RR rotations undertaken during insertion and the LO /RO, 
L1/ R1 and L-1/ R-1 rotations undertaken during deletion. 





Applications 10.4 


Representation of symbol tables in compiler design 


Compilers are translators that translate a source programming language code into a target 
programming language code viz., machine code or Assembly level language code. The various 
phases in the design of compilers include Lexical analysis, Syntactic analysis, Semantic analysis, 
Intermediate code generation, Code optimization and Code generation. 


The McGraw-Hill Companies 


244 


Balanced tree 
before deletion 





Balanced tree 
before deletion 


(-1) 





Balanced tree 
before deletion 


Unbalanced tree 
after deletion 


(-2) 






(a) Lo rotation 


Unbalanced tree 
after deletion 


(-2) 


CL CR 


(b) L1 rotation 


Unbalanced tree 
after deletion 


Data Structures and Algorithms 


Balanced tree 
after deletion 


(+1) 





Balanced tree 
after deletion 


(0) 


Balanced tree 
after deletion 





(c) L —1 rotation 


Fig. 10.25 Generic representations of LO L1 and L—1 rotations 


During lexical analysis, which is the first phase of the compiler, the source program is scanned 
character by character to identify the keywords, user identifiers, constants, labels etc which are 
termed as tokens. These tokens are stored in data structures called symbol tables which store 





The McGraw-Hill Companies 


Binary Search Trees and AVL Trees 245 


information pertaining to the tokens as a name-attribute pair. Thus there are individual symbol 
tables for keywords, user identifiers, constants etc. 

Symbol tables which are constructed for a fixed set of data already known in advance and 
calling for no insert or delete operations after construction, but are only susceptible to search or 
retrieval operations are known as static tables. On the other hand those symbol tables which 
support insertion and deletion operations besides search are known as dynamic tables. While a 
keyword table is an example of a static symbol table, a user identifier table is an example of a 
dynamic table. 

With regard to the keywords which are fixed for a given source language, a compiler stores 
them using a static symbol table using an appropriate data structure which favors efficient 
retrievals. This is so, since the Lexical Analyzer distinguishes a keyword token k, from a user 
identifier token u, by undertaking a search of the tokens k and u on the keyword table. While the 
search for k in the keyword table would yield a successful search, the same for u would yield an 
unsuccessful search. Those appropriately selected tokens which yield an unsuccessful search in 
the keyword table are classified as user id tokens and stored separately in a user id table. It is 
here that one sees the application of binary search trees and we terminate any further discussion 
on compilers at this point. 

Since the keyword and user id tokens need searching for the presence or the absence of their 
respective tokens in the key word table, it is essential that the keyword table is represented using 
a data structure that supports efficient retrievals. A binary search tree is an excellent candidate 
for the representation of both static and dynamic symbol tables considering its O(log n) average 
case complexity for insert, delete and retrieval operations. Figure 10.26 shows a sample keyword 
table represented using a binary search tree. Figure 10.27 illustrates a successful and an 
unsuccessful search of a token on the tree shown in Fig. 10.26. While the retrieval of token 
“while” yields success in 5 comparisons, the retrieval of “average val” which is a user id 
results in an unsuccessful search in 4 comparisons and hence find its place in the user id table 
shown. 





Fig. 10.26 Representation of a sample keyword table using binary search trees 


The application of binary search trees for the representation of keyword symbol table in compiler 
design can be probed further to bring in the application of AVL search trees as well. It is known 
that for a given set K of keywords, a finite set of binary search trees may be constructed. However, 





The McGraw-Hill Companies 


246 Data Structures and Algorithms 


the problem now is to look for the most efficient representation amongst the binary search trees. 
Though a procedure to construct an optimal binary search tree, viz., a binary search tree which 
reports the minimum cost (see Illustrative Problem 10.4) exists, the optimal binary search tree 
may lead to inefficient retrievals comparatively due to the imbalance of nodes. It is in such a case 
that the application of AVL search trees becomes visible. Representing the keyword table as an 
AVL search tree ensures the retrieval of tokens in O(log n) time in the worst case. 





Retrieve token KEYWORD TABLE | Retrieve token 
“while” “AVERAGE-VAL” 





—> Path leading to the = Path leading to 
successfull search the unsuccessful 
of token “while” search of token 


“AVERAGE-VAL” 


Fig. 10.27 Searching for a keyword and a user id from a keyword symbol table represented using 
a binary search tree 


O Summary 


> Binary search trees may be empty or if otherwise, are labeled binary trees where the left 
child key is less than its parent node key and right child key is greater than the parent node 
key. All the keys forming a binary search tree are distinct. Binary search trees are 
represented using linked representations. However in many cases it is convenient to 
represent them as extended binary trees. 

> A search or retrieval operation on a binary search tree is of O(log n ) complexity and hence 
is more efficient than the same over a sequential list. The insert and delete operations are 
also of O(log n) complexity. 








The McGraw Hill Companies 


Binary Search Trees and AVL Trees 247 


> 


The insertion of a key in a binary search tree is similar to that of searching for the key 
(unsuccessfully) and inserting it at the appropriate position as a leaf node. The deletion of 
a binary search tree is discussed depending on whether the deleted node is a leaf node or 
a node with a single subtree or a node with two subtrees. 

Binary search trees suffer from the drawback of becoming skewed especially when the keys 
are inserted in their sorted order. 

AVL trees are height balanced trees with the balance factor of the nodes being either 0 or 
1 or -1. AVL search trees are height balanced binary search trees 

The search or retrieval operation on an AVL search tree is similar to that on a binary search 
tree. 

The insertion operation on an AVL search tree is similar to that of a binary search tree but 
when it leads to imbalance of the tree, any one of the rotations viz., LL, LR, RL and RR are 
undertaken to rebalance the tree. 

The deletion operation on an AVL search tree is classified as that of a leaf node, or a node 
with a single subtree or a node with two subtrees. In the event of any imbalance in the tree, 
the R category of rotations viz., RO, R1, R-1 or the L category of rotations, viz., 

LO, L1 and L-1 or a combination are called for to rebalance the tree. 

Binary search trees and AVL search trees find application in the representation of symbol 
tables in compiler design. 


© Illustrative Problems 


Problem 10.1 (a) Construct a binary search tree T for the following set S of elements in 


the order given: 
S={ INDIGO, GREEN, CYAN, YELLOW, RED, ORANGE, VIOLET} 
(b) How many comparisons are made for the retrieval of “yellow” from 
the tree corresponding to the one drawn in 10.1(a)? 
(c) For what arrangements of elements of S will the associated binary 
search tree turn out to be skewed? 
(d) For the binary search tree(s) constructed in I 10.1(c) how may 
comparisons are made for the retrieval of “yellow”? 


Solution: (a) The binary search tree constructed for S with the elements considered in the order 


given, is shown in Fig. I 10.1(a). 

(b) The number of comparisons for the retrieval of “yellow” is 2. The search key 
“yellow” is first compared against the root which is “indigo” and since “yellow” 
is greater than “indigo” moves to the right of “indigo” only to find itself after 
the second comparison. 

(c) If the elements of S are sorted either in the ascending or in the descending order 
then the associated binary search tree will be either right skewed or left skewed 
respectively. Figure I 10.1(b) illustrates the trees. 

(d) In the case of the left skewed tree the number of comparisons for “yellow” is 1 
and in the case of the right skewed tree the number of comparisons is 7. 


The McGraw-Hill Companies 





248 Data Structures and Algorithms 


Insert: INDIGO 


Insert: GREEN, CYAN 


Insert: YELLOW, RED 


Insert: ORANGE, VIOLET 


Fig. | 10.1(a) 





The McGraw-Hill Companies 


Binary Search Trees and AVL Trees 249 


Ascending order of S: {CYAN, GREEN, INDIGO, ORANGE, RED, VIOLET, YELLOW} 


Corresponding binary search tree : 





Fig. | 10.1(b) 


Problem 10.2 Given the following binary search trees draw the same after the deletion of 
the specified elements in the respective binary search trees. 





The McGraw-Hill Companies 


250 Data Structures and Algorithms 


(2) Delete g (>) Delete d 
O © RB 


(iii) 


Delete f 





Solution: Deletion of f is a case of deletion of a node with a single non empty subtree with root 
i. Hence simply delete f and link the node e with node i. 


Binary search tree 
after deletion of f 


Deletion of g is a case of deletion of a node with two non empty subtrees. Delete g and push k 
along with its subtree to be the root. a joins as the left child of k. 


Binary search tree 
after deletion of g 


Deletion of d is again a case of deletion of a node with two non empty subtrees. After the deletion 
of d, k along with its subtrees moves up. a joins g as its left child. 


Binary search tree 
after deletion of d 





The McGraw-Hill Companies 


Binary Search Trees and AVL Trees 251 


Problem 10.3 On the binary search tree shown in Fig. I 10.1(a), perform the following 
operations in the order shown: 
Insert GREY, Insert PINK, Delete YELLOW, Delete RED 


Solution: The trees after the two insert operations and two delete operations in the given sequence 
are shown below: 


Insert GREY 
Insert PINK 


Delete YELLOW Delete RED 


Problem 10.4 [optimal binary search trees] Let S = (a4, a, az, ...a,,} be a set of elements 
and T, be the set of associated binary search trees that can be constructed out of S. Let p; 1 < í 
< n, be the probability with which a; is searched for (probability of successful search) and let q, 
0 <j < n, be the probability with which a key X, a; < X < a is unsuccessfully searched for 
(probability of unsuccessful search) on a binary search tree i As explained in Sec. 10.2, the 
search for such an X will end up in an appropriate external node e;. The cost of a binary search 
tree is given by 


> p;-level(a;) + pz g; -(level(e;) — 1) 


1<i<n O<sjsn 





The McGraw-Hill Companies 


252 Data Structures and Algorithms 


An optimal binary search tree is a tree T e T, such that cost(T) is the minimum. The 
term >, q; .(level(e;)—1) when gj 0 < j < n represents weights associated with the external nodes 
O<j<n 
is known as weighted external path length. 
1 1 1 


Consider a set S = { end, goto, print, stop}, let {p} Poy py Pa} -€1,141 and 


{Jor 4v Yor Far Fah = B + ; =, 44l . Find the cost of the following binary search trees and show 





of the two which one has the minimum cost? 





Solution: The cost of tree (i) is given by 


13, 26 _ 39 
20 5 10 20 5 10 5 20 20 


20 20 20 


The cost of tree(ii) is given by 


1 1 1 1 1 1 1 1 1 | 18 30 48 
— -24+=-34+—-1+—-2 = -24+—-34+=-34+—-24+—-2) = —4+—=— 
Ip 5 “1 ~ 20 | + N to 5 2 t20 " 96 20 20 20 


Hence of the two binary search trees, tree(i) is a minimum cost binary search tree. 


Problem 10.5 Which of the following is an AVL tree? 


(i) (ii) (iii) 


Solution: Tree (iii) 


The McGraw-Hill Companies 





Binary Search Trees and AVL Trees 253 


Problem 10.6 Construct an AVL search tree using the following data. Perform the 


appropriate rotations to rebalance the tree. 
OS/2, LINUX, DOS, UNIX, XENIX, MAC 
Solution: The construction of the AVL search tree is as shown below: 


Insert OS/2 


Insert LINUX 


LL Rotation (0) (0) 


(0) 
(+2) (0) 
Insert DOS (osi2) (LINUX) 
(+1) 
w.r.t.A 
(Nuy == Ço) Çs 
(0) 


Insert UNIX 





Insert XENIX 





RR Rotation 
w.r.t.A 








The McGraw-Hill Companies 


254 Data Structures and Algorithms 


Insert MAC 


w.r.t.A 





Problem 10.7 For the AVL search tree constructed in Illustrative Problem 10.6, perform 


the following operations using the original tree for each operation: 
Delete DOS, Delete UNIX, Delete OS/2 


Solution: Each of the delete operations as performed on the original tree is illustrated below: 


(0) (+1) 


Original tree : Delete UNIX : 





Delete DOS: Delete OS/2 





Observe that each of the three cases are examples of deletion of a leaf node, a node with a single 
subtree and a node with both subtrees respectively. Also the deletion operations are rotation free. 





The McGraw-Hill Companies 


Binary Search Trees and AVL Trees 255 


Problem 10.8 For the following AVL search tree undertake the following deletions in the 


sequence shown: 
Delete I, Delete B, Delete H 


Solution: ‘The balanced tree after the execution of the three operations in a sequence is shown 
below: 


Delete I (+2) (0) 
Delete / 





—| 
RI rotation Ch) 
== = (4) 





Delete B 
Delete H (+2) 
0 
©) \ ©) Ro rotation 
Pe) < — 
(0) (0) 





Here, while Delete I and Delete H called for R1 and RO rotations respectively, Delete B did not 
call for any rotations. 





The McGraw-Hill Companies 


256 Data Structures and Algorithms 


Problem 10.9 Perform the following deletions on the given AVL search trees: 





Solution: This problem is an illustration of L category rotations. Hence the notations A, B, C, A,, 
Ap, Br, Cy, Cp and employed in the generic representations of the L rotations (Sec. 10.3) have been 
made use of in the tree for ease of understanding. 


B 





(+1) 


Delete bat 
ea.. 





Delete bat 
— 





/, rotation 








The McGraw-Hill Companies 


Binary Search Trees and AVL Trees 257 


Ahh) 





Delete bat 
——— 








Problem 10.10 Delete 75 from the following AVL search tree: 


(+1) 


(1) (—1) 
(34) (75) 


(+1) (+1) ob (—1) 


(22) a) G Q, 


C1) 0) (+1) (-1) (0) (+1) 


; (0) 
WD Gs) G9 wD @) (90) 


(0) 
35) w Gs © 
(0) 


Solution: The deletion of 75 leads to deletion of 79 and calls for two rotations viz., L1 and R-1 to 
set right the imbalance. The various steps during the deletion are shown below: 


(0) 





The McGraw-Hill Companies 


258 Data Structures and Algorithms 


Step 1: Delete 75 
leads to 
Delete 79 







Left subtree 
of 46 


(+2) 
Step 2: Call LI rotation 
(I) (0) 
© (79) 
(+1) (+1) G (0) 
(+1) (0) = (41) (-1) (0) (0) (0) 
G1) eI GY GD C GO o 
(0) Œ) So Ww 
O GI G) G5 
(0) 
Step 3: Call R —1 rotation (0) 
(0) (-1) 
© © 
(+1) (+1) (—1) (0) 
(+1) (0) (0) (0) (0) C1) 
G1) G GI w E Gs) G 





The McGraw-Hill Companies 


Binary Search Trees and AVL Trees 259 


'@) Review Questions 


1. Which among the following norms is not satisfied by a binary search tree T ? 
(i) all keys of the binary search tree need not be distinct 
(ii) all keys in the left subtree of T are less than the root element 
(iii) all keys in the right subtree of T are greater than the root element and 
(iv) the left and right subtrees of T are also binary search trees. 
(a) (i) (b) (ii) (c) Gii) (d) (iv) 
2. State whether true or false: 
In the case of a binary search tree with n nodes, 
(i) the number of external nodes is (n — 1) 
(ii) an inorder traversal yields the keys of the nodes in the ascending order 
(a) (i) true (ii) true (b) (i) true (ii) false (c) (i) false (ii) true (d) (i) false (ii) false 
3. Which among the following deteriorates the performance of a binary search tree T with n 
nodes? 
(i) when there are large number of deletions to T 
(ii) when the number of external nodes becomes (n+1) 
(iii) when the height of T becomes n 
(iv) when the height of T becomes log, n 
(a) (i) (b) (ii) (c) (iii) (d) (iv) 
4. The balance factor of any node in an AVL tree cannot be 
(a) 0 (b) +1 (c) -1 (d) +3 
5. In an AVL tree, during the deletion of node t which is a leaf node and whose parent node 
is node p, which among the following does not happen during the sequence of operations? 
(i) physically delete node ¢ and make the child link of its parent, viz., node p null. 
(ii) make the child link of node p point to the child node of node t and then physically 
delete node t 
(iii) update the balance factor of node p based on whether the deletion occurred to its right 
or left 
(iv) update the balance factors of the ancestor nodes of node p 
(a) (i) (b) (ii) (c) (iii) (d) (iv) 
6. If a key u to be deleted from a binary search tree has only a left subtree and if it were the 
right child of its parent node, then 
(i) set both the links of its parent node to NIL 
(ii) allow the right link of its parent node alone to be NIL 
(iii) allow the left link of its parent node to point to the left subtree of key u 
(iv) allow the right link of its parent node to point to the left subtree of key u 
(a) (i) (b) (ii) (c) (iii) (d) (iv) 
7. How are binary search tree representation of lists advantageous over their sequential list 
representations? 
8. How is the deletion of a node that has both left and right subtrees, undertaken in a binary 
search tree? 
9. What is the need for an AVL tree? 
10. How is the rotation free deletion of a node having both the subtrees, done in an AVL search 
tree? 


260 


11. 


12. 


13. 


14. 


15. 





The McGraw-Hill Companies 


Data Structures and Algorithms 


Outline the generic representation of an LL type imbalance in an AVL search tree and the 
corresponding rotation. 

For the following list of data construct a binary search tree: 

LINUX, OS2, DOS, XENIX, SOLARIS, WINDOWS, VISTA, XP, UNIX, CPM, 

Undertake the following operations on the binary search tree: 

(i) Insert MAC (i) Delete WINDOWS (iii) Delete UNIX 
Represent the data list shown in Review Questions 10.12 as a sequential list. Tabulate the 
number of comparisons undertaken for retrieving the following keys: 

(i) LINUX (ii) XENIX (iii) DOS (iv) UNIX (v) CPM 
For the data list {AND, BEGIN, CASE, DO, END, FOR, GOTO, IF, IN, LET, NOT, OR, QUIT, 
READ, REPEAT, RESET, THEN, UNTIL, WHILE, XOR} 

(i) Construct a binary search tree. What are your observations? 

(li) Construct an AVL search tree. 
For the AVL search tree constructed in Review Questions 10.14 delete the following keys in 
the order given: 
XOR, READ, END, AND 


(=) Programming Assignments 


1. 


Zs 
3. 


Implement a menu-driven program to perform the following operations on a binary search 
tree: 

(i) Construct a binary search tree /* construction begins from an empty tree*/ 

(ii) Insert element(s), into a non empty binary search tree 
(iii) Delete element(s) from a non empty binary search tree 

(iv) Search for an element in a binary search tree 
Write a function to retrieve the elements of a binary search tree in the sorted order. 
Execute a programming project to illustrate the construction of an AVL search tree. 
Demonstrate the construction, insertion and deletion operations using graphics and 
animation 
[Huffman Coding] D. Huffman applied binary trees with minimal external path length to 
obtain an optimal set of codes for messages M; 0 <j < n. Each message is encoded using 
a binary string for transmission. At the receiving end, the messages are decoded using a 
binary tree in which external nodes represent messages and the external path to the node 
represents the binary string encoding of the message. Thus if 0 labels a left branch and 1 a 
right branch, then a sample decode tree given below can decode messages Mọ, M,, M>, M, 
represented using codes {0, 100, 101, 11}. These codes are called Huffman codes. The 
expected decode time is given by }) g;-d; where is the distance of the external node 

O<i<n 

representing message M; from the root node and q; is the relative frequency with which the 
message M; is transmitted. It is obvious that the cost of decoding a message is dependent 
on the number of bits in the binary code of the message and hence on the external path to 
the node representing the message. Therefore the problem is to look for a decode tree with 
minimal weighted external path length to speed up decoding. 





The McGraw-Hill Companies 


Binary Search Trees and AVL Trees 261 





(i) Investigate the algorithm given by D. Huffman to obtain a decode tree with minimal 
weighted external path length. 
(ii) Implement the algorithm given a sample set of weights g;. 
5. Implement an algorithm which given an AVL search tree T and a data item X will split 
the AVL search tree into two AVL search trees T}, T, such that all the keys in tree T} are less 
than or equal to X and all the keys in tree T, are greater than X respectively. 


CHAPTER 


B TREES AND 
TRIES 





11.1 Introduction 

11.2 m-way search 
trees: Definition 
and Operations 


11.3 B Trees: Definition 
and Operations 


Introduction 11.1 





In this chapter, we discuss data structures pertaining to multi-way 
trees. Multi-way trees are tree data structures with more than two 


branches at a node. The data structures of m-way search trees, B 11.4 Tries: Definition 
trees and Tries belong to this category of tree structures. and Operations 
AVL search trees which are height-balanced versions of binary 11.5 Applications 


search trees no doubt promote efficient retrievals and storage 

operations. The complexity of insert, delete and search operations 

on AVL search trees is O(log n). However, while considering applications such as File indexing 
where the entries in an index may be very large, maintaining the index as m-way search trees 
provides a better option than AVL search trees which are but only balanced binary search trees. 
While binary search trees are two-way search trees, m-way search trees which are extended binary 
search trees are multi-way search trees and hence favor more efficient retrievals. 

B trees are height balanced versions of m-way search trees and hence command their own 
merits. In the case of all the search trees which deal with key based storage and retrievals, it is 
essential that the keys are of fixed size for efficient storage management. In other words, these 
data structures do not recommend representation of keys with varying sizes. Tries are tree based 
data structures that support keys with varying sizes. Also while other search trees indulge in 
searching based on the whole key, tries are based on searching only a portion of the key before 
the whole key is retrieved or stored. 

The definition and operations of the three data structures viz., m-way search trees, B trees and 
Tries are detailed. Finally, the application of the data structures to file indexing and spell checking 
are discussed. 





m-way search trees: Definition and Operations 


m-way search trees are extensions of binary search trees. Binary search trees indulge in at most 
two way branching at every node with the left subtree of the node representing elements less than 
the key value of the node and the right subtree representing elements which are greater than the 
key value of the node. On the other hand, each node of an m-way search tree can hold at most 
m branches. Thus, m-way search trees adopt multi way branching extending the above mentioned 
characteristic of binary search trees. 





The McGraw-Hill Companies 


B Trees and Tries 263 


Definition 
An m-way search tree T may be an empty tree. If T is non-empty then the following properties 
must hold good: 

(i) For some integer m, known as the order of the tree, each node has at most m child nodes. 
In other words, each node in T is of degree at most m. Thus a node of degree m will be 
represented by Co, (Ky C1), (Ky, Cy), (Ky, Ca) c (K,, 1 GC, 1) where K; 1<1< m-—1 are 
the keys and C,, 0 < j < m — 1 are pointers to the root nodes of the m subtrees of the node. 

(ii) If a node has k child nodes, k < m, then the node has exactly (k-1) keys K,, Ky, K3, ... K,._ 4 

where K; < K; , , and each of the keys K; partitions the keys in the subtrees into k subsets. 

(iii) For a node Cy, (Kj, C1), (Ks, Co), (Ka, C3) --. (Ky, — Cm — 1) all Key values in the subtree 
pointed to by C; are less than the key K; , ,, 0 < i < m — 2 and the key values in the subtree 
pointed to by C,,_, are greater than K,, _ ;. 

(iv) The subtrees pointed to by C, 0 < i < m — 1 are also m-way search trees. 


Node structure and representation 


A m-way search tree is conceived to be an extended tree with its null pointers represented by 
external nodes. Although this method of representation is useful for its definition and discussion 
of its operations, the external nodes as in the case of any general tree, are only fictitious and not 
physically represented. 

Figure 11.1 illustrates a general structure of a node in an m-way search tree. The node has 
(m-1) key elements and hence exactly m child pointers to the root nodes of the m subtrees. Those 
pointers to subtrees which are empty are indicated by external nodes represented as circles. 


Ki | Ko Ks |e + + [Ki [+ + 1K [e+ + Km alin 
Le = 





Om a Q Ci Cj Cm- ] 
C; : Pointer to an extended node 
C; : Pointer to a non empty subtree 


Fig. 11.1 Structure of a node in an m-way search tree 


Example 11.1 An example 4-way search tree is shown in Fig. 11.2. Observe how each node 
has at most 4 child nodes some of which are external nodes. Also every node has exactly (p—1) 
keys if the number of child nodes it has is p. The root node for example has four child pointers 
and hence four subtrees. The number of keys in the root node is therefore three viz., [34, 56, 84]. 
The first subtree of the root node contains keys which are less than 34, the second subtree which 
is to contain keys greater than 34 and less than 56 is empty, the third subtree contains keys greater 
than 56 and less than 84 and finally the last subtree contains keys which are greater than 84. A 
similar concept is extended to every other node in the subtrees. 





The McGraw-Hill Companies 


264 Data Structures and Algorithms 





Fig. 11.2 An example 4-way search tree 


Searching an m-way search tree 


Searching for a key K in an m-way search tree T, is an extension of the method by which the key 
would have been searched for in a binary search tree. K is first sequentially searched against the 
key elements of the root node [K; K; +1 ... K,]. If K = K; then the search is done. If K > K; and 
K < Kj, 1 for some j, then the search moves down to the root node of the corresponding subtree 
T;. The search progresses in a similar fashion until the key is obtained in which case the search 
is termed successful otherwise unsuccessful. 

Consider the sample 4-way search tree shown in Fig. 11.2. Searching for key 6 calls for moving 
down the first subtree of the root [34, 56, 84], since 6 < 34. The search further moves down the 
first subtree of the node [11, 20, 31] since 6<11. Now the node [6, 9] is reached. A mere sequential 
search of the key in the list of elements reports the presence of 6. The path traced by the search 
for the successful search of 6 is shown in Fig. 11. 3. Let us now search for key 66 in the tree shown 
in Fig. 11.2. Since 66 >56 and 66<84 the search moves down the third subtree of the root node. 
Here the node [64, 72] is encountered and it is sequentially searched for 66. Since 66>64 and 
66<72, the search moves down the second subtree of the node [64, 72] which is an external node. 
Here the search terminates and the search is termed unsuccessful since the element 66 is not 
found in the tree. The path traced by the search is shown in Fig. 11.3. 


Search for 6 Search for 66 









Unsuccessful 
Successful search of 66 


search for 6 LL. Keys involved in the search for 66 


Fig. 11.3 Search for keys 6 and 66 from the 4-way search tree shown in Fig. 11.2 





ex] Keys involved in the search for 6 





The McGraw-Hill Companies 


B Trees and Tries 265 


Inserting into an m-way search tree 


The insertion of a key into a m-way search tree proceeds as one would search for the key. The 
search is bound to fall off at some node in the tree. At that position, the key may be either inserted 
as an element, if the node can accommodate the key or may be inserted as a new node in the next 
level. 

Consider the insertion of key 95 in Fig. 11.2. Searching for 95 results in falling off the tree at 
the node [90, 99]. Since the tree is a 4-way search tree every node in the tree can accommodate 
at most 3 keys. Therefore we merely insert 95 in the node [90, 99] to obtain [90, 95, 99]. 
Accordingly observe the change in the pointer fields of the node. Let us now insert 25 into the 
same tree. In this case searching for 25 results in falling off the tree at an external node belonging 
to [11, 20, 31]. Since the node is already full with three elements, we insert 25 as a new node in 
the subtree of [11, 20, 31]. Figure 11.4 illustrates the insertion of keys 95 and 25 in the given 
4-way search tree. 





T New child 
pointer field 


AS New child pointers 
Fig. 11.4 Insertion of keys 95 and 25 into the 4-way search tree shown in Fig. 11.2 


Deleting from an m-way search tree 


The delete operation as always, is complicated when compared to its insert operation 
counterpart. To delete a key we proceed as usual to find the key in the tree. Now let us suppose 
the key K is found in a node with its left subtree pointer as C; and its right subtree pointer as C.. 
Based on the following cases (Dm in the cases indicates Deletion in an m-way search tree) the 
deletion of K is undertaken: 


Case Dm. 1 C;= C, = NIL. If the left and right subtrees of key K are NIL, then we simply delete 
the key K and adjust the number of pointer fields in the node. 


Case Dm. 2 C; = NIL and C; # NIL. If the left subtree of key K is empty and the right subtree 
is not, choose the smallest key K’ from the right subtree of K and replace K with K’. This in turn 
may recursively call for the appropriate deletion of K’ from the tree following one or more of the 
four cases. 


Case Dm. 3 C; # NIL and C, = NIL. If the right subtree of key K is empty and the left subtree 
is not, choose the largest key K” from the left subtree of K and replace K with K”. This in turn 
may recursively call for the appropriate deletion of K” from the tree following one or more of the 
four cases. 





The McGraw-Hill Companies 


266 Data Structures and Algorithms 


Case Dm. 4 C; # NIL and C, + NIL. If the left subtree and the right subtree of key K are non 
empty, then choose either the largest key from the left subtree or the smallest key from the right 
subtree. Call it K”. Replace key K with the same and as before undertake appropriate steps to 
delete K” from the tree. 

Let us delete 9 from the 4-way search tree shown in Fig. 11.2. The deletion belongs to Case Dm. 
1 where both the left and right subtrees of key 9 are empty. We simply delete 9 from the node. 
The operation delete 11 belongs to Case Dm. 3 where the left subtree of key 11 is non empty and 
its right subtree is empty. We replace 11 with the largest key from its left subtree viz., 9 and delete 
9 following Case Dm. 1. Finally, delete 84 illustrates Case Dm. 4 where both its left and right 
subtrees are not empty. In such a case, we choose to replace 84 with the smallest key from the 
right subtree of 84, viz., 90. The deletion of 90 in turn follows Case Dm. 1. The 4-way search tree 
after the deletion of 9, 11 and 84 are illustrated in Fig. 11.5 (a-c). 





(c) Delete 84 


Fig. 11.5 Deletion (independent) of keys 9, 11 and 84 from the 4-way search tree shown in 
Fig. 11.2 





The McGraw-Hill Companies 


B Trees and Tries 267 


Example 11.2 Consider a 5-way search tree shown in Fig. 11.6. Let us insert B, Y and L in 
the sequence given into the original tree. Fig. 11.7 illustrates the insertions. While B and L call for 


insertion of the keys in the existing nodes, Y needs a new node to be opened since the node 
[S, T, X, Z] is full. 





Fig. 11.6 An example 5-way search tree 


Insert B .G{Js{M{Q 
- Mt. A 





Insert L 


Fig. 11.7 /nsertion of B, Y, L into the 5-way search tree shown in Fig. 11.6 


Example 11.3 In the 5-way search tree shown in Fig. 11.6, let us delete T, G, Q with each 
deletion independent of the other and performed on the original tree. 

Delete T follows Case Dm. 1 of Sec. 11.2. We simply delete T and change the pointer fields 

accordingly. Delete G follows Case Dm. 3. We replace G with the largest key value chosen from 





The McGraw-Hill Companies 


268 Data Structures and Algorithms 


its left subtree viz., F. F is deleted from its original position following Case Dm. 1. Delete Q 
follows Case Dm. 4. Q is replaced by the smallest element in the right subtree of Q viz., S. S is 
deleted from its original position following Case Dm. 1. Figure 11.8 illustrates each of the three 
deletions. 





K 
La | D | F | 
aca te Po Par eee rae 


(a) Delete T 





oare) pa PE PTEE 


(b) Delete G 






T|x|z 


EJ T 
MEA 


EJEJEJ P 
CEE C el 


(c) Delete Q 
Fig. 11.8 Deletion (independent) of T, G and Q from the 5-way search tree shown in Fig. 11.7 


Drawbacks of m-way search trees 


It is evident that the complexity of a search, insert and delete operation on an m-way search tree 
of height h (excluding external nodes) is given by O(h). An m-way search tree of height h can have 
elements whose number lies between a minimum of h and a maximum of m” — 1. A minimum 
of h elements would mean having one node with one element per node at each level. The 
maximum would be possible when each level has m child nodes and each node on each level has 
(m-1) elements. This would imply the maximum number of nodes in any level 7 to be given by 
mi-1. The total number of nodes in an m-way search tree of height h would be given by 


ho m" —1 
Z m! = ate Hence the maximum number of elements in the m-way search tree of height 
i=l 7 


(m" —1) 


TE; . (m = 1) = (mh = 1). 





h would be given by 


The McGraw-Hill Companies 


B Trees and Tries 269 


Since the number of elements in an m-way search tree of height h varies from a minimum of 
h to a maximum of m” — 1, if the tree represents n elements then the height varies from a 
minimum of log, (1 + 1) to a maximum of n. Thus in the worst case the height of the m-way search 
tree representing n elements may be O(n) resulting in poor performance. Hence it is essential that 
even m-way search trees are maintained with balanced heights. B trees are height balanced m-way 
search trees and this data structure is detailed in the next section. 





B Trees: Definition and Operations 


As pointed out in Sec. 11.2, if the growth of the m-way search trees are left unchecked for, then 
they may result in trees which yield a complexity of O(n) in the worst case thereby deteriorating 
its performance. Hence the need for balanced m-way search trees which are known as B trees of 
order m. B trees assure a complexity of O(log n) for their search, insert and delete operations. 


Definition 
A B tree of order m is an m-way search tree and hence may be empty. If non empty, then the 


following properties are satisfied on its extended tree representation: 
(i) The root node must have at least two child nodes and at most m child nodes 


(ii) All internal nodes other than the root node must have at least 2] non empty child nodes 


and at most m non empty child nodes 

(iii) The number of keys in each internal node is one less than its number of child nodes and 
these keys partition the keys of the tree into subtrees in a manner similar to that of m-way 
search trees 

(iv) All external nodes are at the same level 

The node structure and representation of a B tree of order m is similar to that of an m-way 
search tree (Sec. 11.2). Figure 11.9 illustrates a B tree of order 5. The properties of B trees may be 
easily verified on the example tree. 


rn 


80 | 85 | 90 | 91 | 
lel) Wee el be 
Fig. 11.9 A B tree of order 5 


All internal nodes of the tree except the root have at least H = H =3 child nodes and hence 


have at least two keys in their respective nodes. The root node [46, 77] has at least two child 
nodes. All the external nodes indicated by circles are on the same level. The keys in each of the 
nodes partition the tree into subtrees following the principle of 5-way search trees. 





The McGraw-Hill Companies 


270 Data Structures and Algorithms 


B trees of order 3 are known as 2-3 trees since each of their internal nodes have only two or 
three child nodes. Figure 11.10 illustrates a B tree of order 3. B trees of order 4 are called 2-4 trees 
or 2-3-4 trees. Rudolf Bayer (1972) who first discussed about a 2-4 tree called it a symmetric binary 
D ree. 





pole! CA a 


Fig. 11.10 A B tree of order 3 ( 2-3 tree) 


Searching a B tree of order m 


The search procedure for a B tree of order m is same as the one applied on m-way search trees. 
The complexity of a search procedure is given by O(h) where h is the height of the B tree of order 
m. (See Sec. 11.3). 


Inserting into a B tree of order m 


Inserting a key into a B tree of order m proceeds as one would to search for the key. However 
at the point where the search falls off the tree, the key is inserted based on the following norms 
(IB in the cases indicates Insertion in a B tree): 


Case IB. 1 If the node X of the B tree of order m, where the key K is to be inserted, can 
accommodate K, then it is inserted in the node and the number of child pointer fields are 
appropriately upgraded. 

Thus if node X is given by, [Cy (Ky, ©) (Ko, Cy) -- (Ky, CG) Kin Gip- (K,, C,)] p<(m 
— 1) and K is such that K; < K < K; , , then we merely insert (K, C) where C is the child pointer 
field, at the appropriate position in the node X. The updated node X is given by [Co (K; Cy), (Ky, 
C3) ... (Ky Cj), (K, C), (Kj + L G44). Ky GDI 

Figure 11.11(b) illustrates Case IB. { type of insertion in the generic node shown in 

Fig. 11.11(a). 
Case IB. 2 If the node X where the key K is to be inserted is full, then we apparently insert K 
into the list of elements and split the list into two at its median Kegin: The keys which are less 
than K median form a node Xj), and those greater than K negian form another node X,j,1,;. The median 
element K pegian 1S pulled up to be inserted in the parent node of X. This insertion may in turn call 
for Case IB. 1 or Case IB. 2 depending on whether the parent node can accommodate K pedian OF 
not. 

Figure 11.11(c) illustrates Case IB. 2 type of insertion in the generic node shown in 
Fig. 11.11(a). 

Let us insert 29 into the B tree of order 5 shown in Fig. 11.9. The search for 29 falls off the tree 
at the node [11, 28, 34]. Since the node can accommodate an element more, 29 is inserted into the 
node with appropriate changes made in the number of child pointers. Figure 11.12(a) illustrates 
the B tree after insertion of 29. On the other hand, the insertion of 96 into the B tree results in 
the search falling off the node [80, 85, 90, 91]. However, the node is full. We now apparently insert 





The McGraw-Hill Companies 


B Trees and Tries 


NODE X before insertion of key K 





| | fete] | 
ff eee Parent node of 
NODE X NODE 
Ki \Ki+| Kp 
allal lelali | [al 


Co Ci C2 Ci Ci Cp 


(a) A generic node (NODE X) 
NODE X after insertion of key K 


Ki] Ko | ++ + | Ki [K [Kia] + + + | Kp | 
oI | l exi l I 





Co G C Gu Ç 
(b) Insertion of type case IB.1 
NODE X after insertion of key K 








Parent NODE of 
NODE X 
NODE ight 
NODE X splits Ky+2|Kis3| + + + | Kp | 
i _ Cd 
and NODE eh wie min 
Co € C\ C1 C2 Cp 


(c) Insertion of type case IB.2 
Fig. 11.11 /nsertion of a key K in a B tree of order m 


B tree of order 5 
after insertion 


Insert 29 





Insert 96 46 | 77 | 90 | 


11 | 28 | 34 | 55| 70 
Pa ep el] BAA 
(b) 
Fig. 11.12 /nsertion of 29 and 96 in the B tree of order 5 shown in Fig. 11.9 





271 





The McGraw-Hill Companies 


2/2 Data Structures and Algorithms 


96 into the list to obtain [80, 85, 90, 91, 96] and split the list into two at its median viz., 90. While 
[80, 85] form one node, [91, 96] form another node and the median element 90 is pulled up to 
be inserted in the parent node [46, 77]. Since the parent which is also the root can accommodate 
up to two more elements, 90 is inserted into the node to obtain the new root [46, 77, 90]. 
Figure 11.12(b) illustrates the insertion of 96 in the B tree. 


Example 11.4 For the 2-3 tree shown in Fig. 11.10 let us perform the following operations 
in the sequence given: Insert L, Insert W and Insert M. 


2-3 tree after insertion 


Insert L 





v 


Step 2: 





ih ae | Pr 
EACE Eel Gees) Eel 


Insert M 





Lo, 
ghigii 


Fig. 11.13 /nsertion of L, W and M for the 2-3 tree shown in Fig. 11.10 





The McGraw-Hill Companies 


B Trees and Tries 273 


Insert Z is a Case IB. 1 type of insertion and is accommodated straightaway in node [K]. Insert 
W on the other hand is more involved. It calls for the application of Case IB. 2 twice. W is virtually 
inserted into the node [V, X] to obtain the list [V, W, X]. The list [V, W, X] splits into two nodes 
[V] and [X] pulling W up to be inserted in the node [H, T]. This again triggers a split of the list 
[H, T, W] into two nodes [H] and [W] with [T] further pulled up to act as the new root. Insert 
M triggers Case IB.2 to obtain the nodes [K] and [M] with L accommodated in its parent as 
[HL]: 


Deletion from a B tree of order m 


The deletion of a key K from a B tree of order m may trigger various cases. It may be as simple 
as the cases DB. 1-2 or as complicated as cases DB. 3-4. (here DB indicates Deletion from a B tree) 


Case DB.1 Key K belongs to a leaf node and its deletion does not result in the node having less 
than its minimum number of elements. This deletion is the simplest of the cases. In such a case 
we merely delete the element from the leaf node and adjust the child pointers accordingly. 


Case DB. 2 Key K belongs to a non leaf node. In such a case replace K with the largest key (K’) 
in the left subtree of K or the smallest key (K”) from the right subtree of K and follow steps to 
delete K’ or K” from the node. K’ or K” is bound to occur in a leaf node (why?) and hence triggers 
Case DB. 1 for their deletion. 

Consider the B tree of order 5 shown in Fig. 11.9. Let us delete 80 and 77 both undertaken 
independently on the original B tree. Note that deletion of 80 follows Case DB.1. We merely 
delete 80 and adjust the child pointers. Deletion of 77 on the other hand follows Case DB. 2. We 
replace 77 with 80 the smallest key in its right subtree. The deletion of 80 now follows Case DB.1. 
Figure 11.14 shows the deletion of 80 and 77 from the B tree of order 5 (Fig. 11.9). 


B tree of order 5 
after deletion 


Delete 80 





85 |90| 91 
alə Geb Lelli 


K 
Delete 77 46 | 80 


11 | 28 | 34 | 
DEE Ea Roars 
Fig. 11.14 Deletion of 80 and 77 from the B tree of order 5 shown in Fig. 11.9 


Deletions may turn out to be complicated when they leave less than the minimum number of 
elements in the nodes concerned. Cases DB. 3-4 illustrate these instances. 





The McGraw-Hill Companies 


274. Data Structures and Algorithms 


Case DB. 3 When the deletion of a key K from a node X leaves it with less than its minimum 
number of elements, elements are borrowed from one of its left or right sibling nodes. Thus if the 
left sibling node has elements to spare, move the largest key K’ in the left sibling node to the 
parent node. The intervening element P in the parent node is moved down to set right the 
vacancy created by the deletion of K in node X. 

If the left sibling node has no element to spare it would be a waste of time to move to the right 
sibling node to check if there is an element to spare. What if after the check we were to realize 
that there were no elements to be spared by the right sibling node as well? In such a case we 
proceed to Case DB. 4 which covers the case when either of the sibling nodes have no elements 
to offer. 


Case DB. 4 When the deletion of a key K from a node X leaves its elements to be less than the 
stipulated minimum number and if the first tested sibling node (left or right) or both the sibling 
nodes are unable to spare an element, node X is merged with one of the sibling nodes along with 
the intervening element P in the parent node. We shall choose to test for the availability of 
element from the left sibling node first. If there is no element available to be spared, then the 
elements of the left sibling node are merged with those of node X and the intervening parent 
element P to create a new node. This in turn calls for the deletion of element P which may trigger 
one or more of the cases discussed above. 

Consider the deletion of 44 from the B tree of order 5 shown in Fig. 11.15(a). This is a direct 
illustration of Case DB. 3. The operation would leave the node [36, 44] with less than its minimum 
number of keys and therefore we borrow 18 from the left sibling node which has an element to 






60] 65] 70] | 85] 91 | 94} 99° 
Mel le) el) ee) Las 


(a) B tree of order 5 


Delete 44 


11|12 
lahadda GEA [ee ei 


(b) B tree of order 5 after deletion of 44 








Delete 36 


T12 | 18 | 20 | 
a ee alee 
(a) B tree of order 5 after deletion of 36 


Fig. 11.15 Deletion of 44 and 36 from a B tree of order 5 








The McGraw-Hill Companies 


B Trees and Tries 275 


spare. Now 18 replaces the intervening parent element 20 and 20 moves down to fill the space 
created by the deletion of 44 in the node. Figure 11. 15(b) illustrates the deletion of 44. 

Let us proceed to delete 36 from the resulting B tree of order 5 shown in Fig. 11.15(b). Note 
this is a direct illustration of Case DB.4. Since the left sibling node has no elements to spare, we 
merge the left sibling node with the intervening parent element 18 and the node containing the 
only element after deletion of 36, viz., [20]. The new node [11, 12, 18, 20] is now a prospective 
child node. To decide on its appropriate parent we proceed to delete 18 from the parent node. 
Again the deletion of 18 from its node is problem free since the parent node can afford to do the 
same. Thus we obtain [55, 76] to be the updated parent node. The new node [11, 12, 18, 20] joins 
the root as its first subtree. 


Example 11.5 Consider the B tree of order 3 shown in Fig. 11.10. Let us delete V and T both 
undertaken independently on the original tree. 

Deletion of V is a direct illustration of Case DB. 1. We merely delete V and adjust the child 
pointers of the node. Figure 11.16(a) shows the B tree after deletion of V. Deletion of T illustrates 
Case DB. 2 and hence replaces T with V, the largest key in the right subtree. Deletion of V from 
its original position in the node follows Case DB. 1. Figure 11.16(b) illustrates the B tree after 
deletion of T. 


B tree of order 3 after deletion 


Delete V 


lle) CALA CAEN 
(a) 


seid 
iji |e | me 


rir Pei) [cle 


(b) 
Fig. 11.16 Deletion of V and T from the B tree of order 3 shown in Fig. 11.10 





Example 11.6 Given the B tree of order 3 shown in Fig. 11.17, let us delete M. To avoid 
clutter, the snapshots of the B trees during the delete process are shown without pointer fields. 
Observe how the deletion triggers Case DB.4 repeatedly before the tree gets balanced. M is a 
single key in a leaf node and its deletion would leave the node with zero elements. To borrow 
from the node [G] is futile and hence we undertake a merge operation as discussed in Case DB. 
4 and this yields [G, K]. This leaves the parent node concerned with no elements and hence once 
again triggers Case DB.4. The new parent node is [B, F]. Observe the rearrangement of the child 





The McGraw-Hill Companies 


276 Data Structures and Algorithms 


pointers of the node. Case DB. 4 is once again triggered with regard to the empty parent of 
[B, F]. Finally the tree balances itself by settling on [O, T] as its root. 





Delete M 
(B tree is 
shown without 
pointers) 


Step 2 
Pi AA 
[4] [2] Lelk] [r] Ly 





Step 3 
BL 
D] 
(b) Delete process of M 


Fig. 11.17 Deletion of M from a B tree of order 3 


The McGraw-Hill Companies 


B Trees and Tries 277 


Height of a B tree of order m 


If a B tree of order m and height h has n elements then n satisfies n < m” — 1. This is true since 
a B tree of order m is basically an m-way search tree (Sec. 11.2). Now having determined the 
upper bound of n, what is its lower bound? In other words what is the minimum number of 
elements that a B tree of order m and height h can hold? To obtain this let us find out what are 
the minimum number of nodes in levels 1, 2,....(4+1). Here (h+1) is the level at which the external 
m 
2 
and the root has just one node, the minimum number of nodes in each level beginning from 1 


nodes reside. Since each internal node other than the root have a minimum of| | child nodes 


2 h-1 
and ending at (h+1) in the sequential order would be 1, 2, 2. H 2, 2] e 2| 


h-1 
respectively. Thus the number of external nodes on level (h + 1) would be 2. H . Since the 


number of elements in the B tree is one less than the number of external nodes, the lower bound 
h-1 


h-1 
of n is given by n2 2 —1. Hence we have 2: 2] -1< n<m” -—1. From this we can 


n+1 
easily infer that log,, (n+1)<hs O8) n =] +1. This determines the best case and worst case 
2 


complexities of a search, insert and delete operation on B trees which is generally given by O(h), 
the height of the B tree. 





Tries: Definition and Operations 


Search trees in general favor keys which are of fixed size since this leads to efficient storage 
management. However in the case of applications which are retrieval based and which call for 
keys of varying length, tries provide better options. Search trees indulge in multi-way branching 
based on the whole key and hence searching is done based on key comparisons. In contrast, 
though tries are also multi-way branched trees, searching is based only on a portion of a key and 
not on the whole, before it is completely retrieved or stored. 

Tries are also called as Lexicographic search trees. The name trie (pronounced as “try”) originated 
from the word “retrieval “. 


Definition and representation 


A trie of order m may be empty. If non empty, then it consists of an ordered sequence of exactly 
m tries of order m. The branching at any level of the trie is determined only by a portion and not 
by the whole key. 

Alphabetical keys require a trie of order 27 (26 letters of the alphabet + a blank (“ “)) for their 
storage and retrieval. Each branch of the trie partitions the keys into groups beginning with the 
specific alphabet. 

Thus tries have two category of node structures, viz., branch node and information node. A 
branch node is merely a collection of LINK fields each pointing either to a branch node or to an 





The McGraw-Hill Companies 


278 Data Structures and Algorithms 


information node. An information node holds the key that is to be stored in the trie. For example, 
in the case of alphabetical keys, each branch node has 27 LINK fields, one for each of the 26 
alphabet characters and one for a blank (“ “). The keys are stored in the information nodes. To 
access an information node containing a key, we need to move down a branch node or a series 
of branch nodes following the appropriate branch based on the alphabetical characters composing 
the key. All LINK fields that neither point to a branch node nor to an information node are 
represented using null pointers. To avoid clutter, null pointers have not been represented using 
any special notations. 

Figure 11.18 illustrates an example trie for alphabetical keys. The trie stores the keys CAR, 
CARRIAGE, CARAVAN, BIKE, BUS, TRAIN, BICYCLE, AEROPLANE. The information nodes 
wholly store the keys. To access these information nodes, we follow a path beginning from a 
branch node moving down each level depending on the characters forming the key, until the 
appropriate information node holding the key is reached. Thus the depth of an information node 
in a trie depends on the similarity of its first few characters (prefix) with its fellow keys. Here, 
while AEROPLANE and TRAIN occupy shallow levels (level 1 branch node) in the trie, CAR, 
CARRIAGE, CARAVAN have moved down by four levels of branch nodes due to their uniform 
prefixes “CAR”. Observe how we move down each level of the branch node with the help of the 
characters forming the key. The role played by the blank field in the branch node is evident when 
we move down the trie to access CAR. While the information node pertaining to CAR positions 
itself under the blank field, those of CARAVAN and CARRIAGE attach themselves to pointers 
from A and R respectively of the same branch node. 


B C D J 
Tt. ttt. 


c-> LINK (T,‘Z’) 
an 
a A B I U Z K A B 4 
| | | meg oes ey | je] | CC 


C BUS > K A B R A 
Se ee 





CARAVAN 


Fig. 11.18 An example trie 





The McGraw-Hill Companies 


B Trees and Tries 279 


Searching a trie 


To search for a key K in a trie T, we begin at the root which is a branch node. Let us suppose the 
key K is made up of characters k,k,k, ... k,. The first character of the key K viz., k, is extracted 
and the LINK field corresponding to the letter kų in the root branch node is spotted. If LINK(T, 
k), the LINK field of character kų corresponding to the branch node 7, is equal to NIL, then the 
search is unsuccessful, since no such key is found. If LINK(T, k,) is not equal to NIL, then the 
LINK field may either point to an information node or a branch node. If the information node 
holds K then the search is done. The key K has been successfully retrieved. Otherwise, it implies 
the presence of key(s) with a similar prefix. We extract the next character, k, of key K and move 
down the LINK field corresponding to k, in the branch node encountered at level 2 and so on 
until the key is found in an information node or the search is unsuccessful. The deeper the search, 
the more there are keys with similar but longer prefixes. 


Example 11.7 Consider the trie T shown in Fig. 11.18. Let us search for the keys TRAIN and 
CARAVAN. To search for the key TRAIN, we extract the first character ‘T’ and move down the 
LINK field corresponding to ‘T’ in the root branch node. The retrieval is successful since the 
information node corresponding to the LINK holds the key TRAIN. Let us proceed to retrieve 
CARAVAN. The first character C urges the search process to move down LINK(T, ‘C’) in the first 
branch node. The second character A again leads one to move down to the next level and so does 
R. At level four, the LINK field corresponding to the fourth character viz., ‘A’ leads to an information 
node holding the key CARAVAN. The path can be easily traced on the trie shown. 


Insertion into a trie 


To insert a key K into a trie we begin as we would to search for the key K, possibly moving down 
the trie, following the appropriate LINK fields of the branch nodes, corresponding to the 
characters of the key. At the point where the LINK field of the branch node leads to NIL, the key 
K is inserted as an information node. 


Example 11.8 Consider the trie shown in Fig. 11.18. Let us insert SHIP and TRAM into the 
tree. Insertion of SHIP is simple and straight forward. The LINK field corresponding to the first 
character S in the root node is NIL and hence we insert SHIP as an information node in the 
appropriate place of the root branch node. In the case of TRAM, the LINK field corresponding 
to ‘T in the root branch node points to an information node holding TRAIN. This implies that 
there is already a key with a uniform prefix available in the trie. We now remove TRAIN and 
instead open a branch node to accommodate both TRAIN and TRAM. And lo! the second character 
of the two keys matches and so does the third! Since the matching prefixes of TRAIN and TRAM 
(“TRA”) is of length 3, the situation now calls for opening three levels of branch nodes other than 
the root node. It is only at level 4 that TRAIN and TRAM can be inserted as information nodes 
corresponding to the LINK fields of I and M respectively. Figure 11.19 illustrates the insertion of 
SHIP and TRAM in the trie shown in Fig. 11.18. 


Deletion from a trie 


The deletion of a key K from a trie proceeds as one would to search for the key. On reaching the 
information node (NODE I) holding K, the same is deleted. But deletion does not merely stop 


The McGraw Hill Companies 


280 Data Structures and Algorithms 


F 
DO A B € D Pi T Z 





AEROPLANE 


Fig. 11.19 /nsertion of SHIP and TRAM into the trie shown in Fig. 11.18 


with this. It needs to be ensured whether the branch node to which NODE I is linked 
accommodates other information nodes as well! If there are more than one information nodes 
linked to the branch node concerned or if there is at least one LINK field to another branch node 
or both, then the deletion is done. We merely delete the information node holding the key. On 
the other hand, if after deletion of NODE I, it leaves the branch node with just one more key 
(information node NODE 7) then there is no reason why the branch node should be retained at 
all. We delete the branch node and push NODE I’ to a level higher. If the situation leads to NODE 
I’ being the only non empty node in the current branch node, once again we delete the branch 
node and push NODE I’ higher until it finds a position in a branch node that makes the best use 
of its LINK fields. Since the deletions are sensitive to the number of keys (information nodes) that 
are present in a branch node it would be prudent to include a COUNT field in each branch node 
recording the number of information nodes that are attached to the branch node. 


Example 11.9 Let us delete CAR and BIKE from the trie shown in Fig. 11.18. To delete CAR, 
we search for it moving down four levels of branch nodes and spot it at an information node 
before deleting the same. This leaves us with the specific branch node holding two more keys viz., 
CARAVAN and CARRIAGE. Therefore there is nothing that can be done to the branch node and 
the deletion of CAR is deemed complete. 

In the case of BIKE we proceed as before and delete the information node holding the key. But 
note this leaves the branch node with a single key BICYCLE. We therefore delete the branch node 
and proceed to accommodate BICYCLE in the branch node that is a level higher. The current 
branch node holds the key BUS in the LINK field corresponding to ‘U’. The key BICYCLE is 
attached to it corresponding to the LINK field of I. Figure 11.20 shows the deletion of CAR and 
BIKE from the trie shown in Fig. 11.18. 


The McGraw-Hill Companies 


B Trees and Tries 281 





Fig. 11.20 Deletion of CAR and BIKE from the trie shown in Fig. 11.18 


Some remarks on tries 


The performance of search trees is determined by the number of keys that form the tree. Recall 
that the complexities of the search, delete and insert operations were given by O(h) where the 
height h is dependent on the number of keys represented in the search tree. In contrast, the 
performance of the trie is dependent on the length of the key- the number of characters forming 
the key- rather than the number of keys itself. Thus for example, if the length of the keys of a trie 
are equal to 7 then the trie can represent (26)’ = 8031810176 combinations of keys and with the 
maximum length of uniform prefixes within the keys being 6 can retrieve keys in at most 7 
comparisons. In contrast, a search tree such as a binary search tree would need approximately 
log,((26)’ = 33 comparisons for the same! 

In general however, most applications involve keys of large lengths and the number of keys 
to be represented in the trie may be sparse when compared to its capacity. In such a case, tries 
may be expected to perform less better than their search tree counterparts. It is therefore 
recommended that tries which are multi-way trees are used in judicious combinations with other 
search trees. 


Applications 11.5 


Most of the data structures including search trees such as binary search trees and AVL trees are 
suitable for internal searching (i.e.) searching related to small files that can be accommodated in 
the internal memory of the computer. In the case of applications that call for very large files or 
data bases with voluminous records, the files cannot be accommodated in the internal memory 
of the computer. Hence these need to be stored in external memory or what are called auxiliary 
storage or external storage devices such as hard disks, drums etc. While internal memory access is 
very fast, the problem with these external devices are that accesses are time consuming. For 





The McGraw-Hill Companies 


282 Data Structures and Algorithms 


example, to retrieve a record residing in a hard disk, the block in which the record resides is to 
be first accessed, next the entire block of records needs to be read and finally the required record 
is to be retrieved. Hence it is essential that files stored in the external memory resort to strategies 
and techniques resulting in their efficient retrieval and storage. Chapter 14 details methods of file 
organization. It is in this context that multi-way trees such as m-way search trees, B trees and Tries 
find their application. 


File indexing 


Retrieval of records from large files or data bases stored in external memory is time consuming. 
To promote efficient retrievals, file indexes are maintained. An index is a <key, address> pair. The 
purpose of indexing is to expedite the search process or retrieval of a record. Though there are 
more than one file management techniques which employ indexing, Indexed Sequential Access 
Method (ISAM) based files have been the foremost in using indexing for efficient retrievals (see 
Sec. 14.7). The records of the file are sequentially stored and for each block of records, the largest 
key and the block address is stored in an index. To retrieve a record whose key is K, the index 
is first searched to obtain the address of the block and thereafter a sequential search of the block 
should yield the desired record. Figure 11.21 illustrates an ISAM file structure. However if the 
file is too large, then index over indexes may have to be built. 


# By 
101 Audi 
156 Chevrolet 


Block 1 


222 Benz 





# Bo 
342 Maruti 


Key #61 Toyota Block 2 


562 Fiat 





# Bn 
881 Chrysler 






896 F 
$y on Block n 


936 Mercedes 


Physical storage of records in blocks 
#B;— Address of block B; 


Fig. 11.21 An ISAM file structure 





The McGraw-Hill Companies 


B Trees and Tries 283 


From the above it is clear that efficient retrievals now are dependent on the indexes. Though 
indexes are basically look up tables, it is essential that they are represented using efficient data 
structures to expedite retrievals. It is here that one finds the application of multi-way trees such 
as m-way search trees, B trees and tries. 


B trees as file indexes 


B trees are ideally suited for storing file indexes. Each internal node of the B tree stores the <key, 
address> pair. Their balanced heights call for fewer node accesses during the retrievals. Once the 
key is found the address of the record is also accessed along with it thereby speeding up the 
retrieval process. 


Bt trees as file indexes 


B* trees are descendants of B trees. They satisfy all the properties of B trees but for a modification 
in the structure of the leaf nodes. While leaf nodes in B trees hold null pointers (external nodes 
in fact), the leaf nodes of B* trees point to storage areas which contain records having the 
appropriate key values or pointers to each of these records. Therefore in a B* tree to retrieve a 
record given its key, it is essential that the search traverses down to a leaf node to retrieve its 
address. The non leaf nodes only serve to help the search process traverse downwards towards 
the appropriate leaf node. Figure 11.22 illustrates a file index stored as a B* tree. 










<101, # Ry o> <108, # Ryiog> ... 


<111, # R111> ... <134, # R134> ... 


<146, # Ri4e>, ... <208, # Roog> «.. 


<546, # R5467, ... 


< Ki, # R;> Key K; and address of the record R; (# R;) holding the key K; 





Fig. 11.22 B* tree representation of a file index 





The McGraw Hill Companies 


284 Data Structures and Algorithms 


In comparison to B* trees, B trees are more efficient for the following reasons: 

(i) The <key, address> pair of the records are stored directly in the internal nodes of the B trees 
only once thereby saving on storage. In contrast, B* trees store the keys in duplicate, once 
in the internal nodes and the next in the storage areas pointed to by the leaf node pointers. 

(ii) Unlike B* trees, to access the <key, address> pair in a B tree, there is no need to traverse 
down the whole tree to reach the leaf node. The <key, address> pair may be found directly 
in the respective internal nodes. Therefore the keys may be accessed in fewer accesses when 
compared to B+ trees. 


Spell checker 


Most word processing software embed spell checking which is provided online. Any incorrectly 
spelt word is automatically highlighted. Tries find an application in this problem. The words of 
a dictionary are stored as a trie and remembered in the memory of the computer. Every time a 
word is typed, the word is searched for in the trie and anything which does not lead to an 
information node is highlighted as an incorrectly spelt word. However, practical considerations 
call for curtailing the size and storage requirements of the trie since the whole trie needs to be 
present in the memory at the time of spell checking. 


O Summary 


> m-way search trees, B trees of order m and tries are examples of multi-way search trees. 

> m-way search trees are extensions of binary search trees. Searching for a key in a m-way 
search tree is similar to that in a binary search tree. Insertion of a key in a node is directly 
done if the node has less than its maximum number of elements (m-1). On the other hand 
if the node is full then we insert the key as a new node in the next level at the appropriate 
position. 

> To delete a key from a m-way search tree, if the key has both its left and right subtrees to 
be empty then merely delete the key. If any one of the subtrees are non empty or both are 
non empty, then we replace the key to be deleted by either the smallest key in its left 
subtree or the largest key in its right subtree as the case may be. 

> Since the height of a m-way search tree of n elements varies from a minimum of to a 
maximum of n, the worst case performance of the tree may yield O(n). Hence the need for 
height balanced m-way search trees. 

> B trees of order m are height balanced m-way search trees. The insertion of an element in 
a B tree is direct if the node is partially full. If the node is full, the key is virtually inserted 
into the list of keys in the node and the same is split into two nodes at its median element. 
The median element is pushed up to be accommodated in the parent node. This in turn 
may trigger further adjustments amongst the key elements of the parent node. 








The McGraw-Hill Companies 


B Trees and Tries 285 


> The deletion of a key from a leaf node in a B tree is direct if it does not leave the node with 
less than its minimum number of elements. If the deletion belongs to a non leaf node then 
replace it with either the largest key in its left subtree or the smallest in its right subtree. 
If the deletion leaves a node with less than its minimum number of elements, then borrow 
an element either from the left sibling node or the right sibling node provided they have 
an element to spare. Otherwise, merge the node with one of its sibling nodes along with 
the intervening parent element to form a new node. This calls for the deletion of the 
intervening parent element concerned. 

> ‘Tries are search trees based on searching with portions of keys rather than the whole keys 
themselves. The search, insert and delete operation proceed down the trie along the branch 
nodes to reach the information node where the appropriate operation is carried out. 

> The application of B trees to file indexing and Tries to spell checking have been discussed. 


© Illustrative Problems 


Problem 11.1 For the 5-way search tree shown in Fig. I 11.1 perform the following operations 
in the sequence shown: 
Insert u, Delete z, Insert b, Delete p 





Fig. | 11.1 


Solution: For insertion of u, the node [q, r, s, t] is full and hence cannot accommodate u. Therefore 
we create a new node [u]. For delete z operation, since the left and right subtree pointers of z are 
NIL, we can easily delete z from the node [w, x, y, z]. Insert b accommodates b into the node [a, 
c] to form the node [a, b, c]. Lastly, delete p calls for the case where the left and right subtrees of 
the key to be deleted are not NIL. We choose the smallest key of its right subtree, viz., q 
and replace p with gq. In turn q is deleted from its node directly since its left and right subtree 
pointers are NIL. The snapshots of the 5-way search tree for the given operations are shown 
below: 





The McGraw-Hill Companies 


286 Data Structures and Algorithms 


Insert u, Delete z 






Delete z 


Insert u 


Insert b, Delete p Xx Delete p 





ERE 


Problem 11.2 What is the maximum and minimum number of elements a 100-way search 
tree of height 3 canhold? What is its maximum and minimum height if (100)* elements are 
represented in the tree? 


Solution: Following the discussion of Sec. 11.2, the maximum number of elements a 100-way 
search tree of height 3 can hold is (100)? — 1 while the minimum number it can hold is equal 
to its height which is given to be 3. If the tree were to represent (100)? elements, then 
the maximum height of the search tree would be (100)* and the minimum height would be 


logjq9((100)* + 1) = 2. 


Problem 11.3 For the 5-way search tree shown in Fig. I 11.1 how many comparisons are 
needed to search for the elements n and z? 


Solution: Searching for n requires 8 comparisons and searching for z requires 7 comparisons. 


Problem 11.4 Which of the following is a B tree of order 7? If so, why? 


Solution: The tree shown in (1) is a B tree of order 7, since the following properties appropriate 
to the tree are satisfied: 
(i) The root node must have at least two child nodes and that of the given tree has 3 child nodes 





The McGraw-Hill Companies 


B Trees and Tries 287 





(ii) All internal nodes other than the root have at least| Z = 4 child nodes 


(iii) Each node of p child nodes has (p—1) key elements 
(iv) All external nodes are at the same level. 
The tree shown in (ii) is not a B tree of order 7 (but possibly a 7-way search tree) since all the 


external nodes are not on the same level and some of the internal nodes have less than| Z =d 


child nodes. 


Problem 11.5 In the tree shown in figure (i) of Illustrative Problem 11.4 undertake the 
following operations: 
Insert 456, Insert 97 


Solution: To insert 456 into the B tree of order 7 shown in the figure (i) of Illustrative 
Problem 11.4, we arrive at the node [200, 301, 400, 500, 600, 726] while searching for it. Since the 
node is full we virtually insert 456 into the node and split the node into two at its median which 
is 456. While 456 is absorbed in the root which can accommodate it, the split nodes are [200, 301, 
400] and [500, 600, 726]. 

The insertion of 97 is simple and straight. It is accommodated in the node [20, 45, 65]. The B 
tree after the two insertions is shown below: 











pelle e EEEa 


[841/946] 978] 986] 991| 999 
Puree raesrses 





The McGraw-Hill Companies 


288 Data Structures and Algorithms 


Problem 11.6 In the B tree of order 7 obtained in Illustrative Problem Fig. 11.5, perform the 
following operations in the sequence given: 
delete 600, delete 800 and delete 20 


Solution: Delete 600 leaves the node [500, 600, 726] with fewer than its minimum number of 
elements. To borrow an element from its sibling (left sibling) is futile, since it would leave the 
sibling node concerned with less than its minimum number of elements. Therefore the left sibling 
node [200, 301, 400] is merged with [500, 726] along with its intervening parent element 456. 
Deletion of 800 in the resulting tree is done by merely replacing 800 by the smallest key in its 
right subtree viz., 841. Deletion of 20 is simple and direct. The key 20 is merely removed from 
its node. The B tree, after the three deletions that have been carried out is shown below: 


100] 841 


kad = cies) = Egle 


Problem 11.7 The preorder traversal of a B tree of order m is undertaken by visiting all entries 
of the root node first, followed by traversing each of the subtrees from left to right in preorder. 
A postorder traversal of a B tree of order m is undertaken by first traversing all its subtrees from left 
to right in post order and finally visiting all the entries in the root node. 

For the final snapshot of the 2-3 tree shown in Fig. 11.17(b) in the text, undertake preorder and 
post order traversals. 


Solution: The preorder traversal of the 2-3 tree yields: OT BFADGKQPRXVZ 
The postorder traversal of the 2-3 tree yields: ADGKBFPRQVZXOT 





Problem 11.8 Construct a trie for the binary keys 011, 111, 101, 001 


Solution: ‘The trie for the binary keys is as given below: 





Oa MDD 


Problem 11.9 Perform the following operations on the trie constructed in Illustrative 
Problem I 11.8: 
Insert 01, Insert 11, Delete 011, Delete 001 


Solution: ‘The trie after the insert operations is as follows: 





The McGraw-Hill Companies 


B Trees and Tries 289 





Oo D ad D 


The relevant portion of the trie after the deletion of 011 is as given below: 





D @W 


The relevant portion of the trie after the deletion of 001 is as given below: 


Problem 11.10 Construct (i) 2-3 tree and (ii) trie for the following keys in the order of their 


appearance: 
CAT, CAN, PAN, PAT, MAN, MAT, MAP 


Solution: 


(i) The snapshots of the 2-3 tree during its construction are shown below: 






2-3 tree 
after insertion of: 
CAT, CAN, PAN, PAT, MAN 
CAN PAT 





The McGraw-Hill Companies 


290 Data Structures and Algorithms 


MAT, MAP fa ts rae 


CAN MAN MAT PAT 


(ii) The trie after insertion of the key elements is shown below: 





@) Review Questions 


1. Which among the following properties does not hold good for an m-way search tree? 
(i) each node has at least m child nodes. 
(ii) if anode has k child nodes, k < m, then the node has exactly (k—1) keys and each of the 
keys K; partitions the keys in the subtrees into k subsets. 
(iii) for a node Co, (Ky Cy), (Ky, Co), (Kz, C3), ~.. (Ky, 1, Cm _ 1) all key values in the sub tree 
pointed to by Ç; are less than the key K; , 4, 0 < i < m-— 2 and the key values in the sub 
tree pointed to by C,, -1 are greater than Kp _ j. 
(iv) the subtrees pointed to by C, 0 <i<m-—1 are also m-way search trees. 
(a) (i) (b) (ii) (c) (iii) (d) (iv) 
2. To delete key K found in a node of an m-way search tree, with its left subtree empty 
and right subtree non empty, 
(i) simply delete the key K 
(ii) choose the smallest key K’ from the right subtree of K and replace K with K’, which 
in turn may recursively call for the appropriate deletion of K’ from the tree 
(iii) choose the largest key K” from the left subtree of K and replace K with K” which in 
turn may recursively call for the appropriate deletion of K” from the tree. 





The McGraw-Hill Companies 


B Trees and Tries 291 


(iv) choose either the largest key from the left subtree or the smallest key from the right 


(a) 


subtree (K’’”) and replace key K with K” which in turn may recursively call for the 
appropriate deletion of K” from the tree. 


(i) (b) (ii) (c) (iit) (d) (iv) 


3. Which among the following properties is not satisfied by a B tree of order m? 


14. 


(1) 
(ii) 
(iii) 
(iv) 
(a) 
(i) 
(ii) 


(a) 
(c) 


The root node must have at least m child nodes, m > 1. 

All internal nodes other than the root node must have at least H non empty child 
nodes and at most m non empty child nodes. 

The number of keys in each internal node is one less than its number of child nodes 
and these keys partition the keys of the tree into subtrees in a manner similar to that 
of m-way search trees. 

All external nodes are at the same level. 


(i) (b) (ii) (c) (iil) (d) (iv) 


. In the context of insertion of a key K into a B tree of order m, state whether true or false: 


If the node X of the B tree of order m, where the key K is to be inserted, can 
accommodate K, then it is inserted in the node and the number of child pointer fields 
are appropriately upgraded. 

If the node X where the key K is to be inserted is full, then we apparently insert K into 
the list of elements and split the list into two at its median K, 
(i) true (ii) true (b) (i) true (ii) false 
(i) false (ii) true (d) (i) false (ii) false 


edian’ 


Which among the following properties is not satisfied by a B tree of order m? 


(i) 
(ii) 
(iii) 
(iv) 
(a) 


keys are stored in the information nodes. 

the depth of an information node in a trie always depends on the length of the key. 
to access an information node containing a key, we need to move down a branch node 
or a series of branch nodes following the appropriate branch based on the alphabetical 
characters composing the key. 

a branch node is merely a collection of pointers to either a branch node or an 
information node. 


(i) (b) (ii) (c) (iil) (d) (iv) 


What are the merits of m-way search trees over AVL search trees? 
What are the demerits of m-way search trees? 

What is the need for B trees? 

Distinguish between 2-3 trees and 2-4 trees. 

What is the height of a B tree of order m? 


. What is the need for tries? 


When do tries perform less better than search trees? 
Insert the following elements in the order given into an empty B tree of order (i) 3 (ii) 4 and 
(iii) 7 
ZRTADFHQWCVBSEOPLIJIKMNUTX 
Undertake the following operations on the B trees: 
(i) Delete Q (ii) Delete A (iii) Delete M (iv) Delete S 
For the data list shown in Review Question 13 (Chapter 11) construct a 3-way search tree. 
Insert G and delete J K and Z from the search tree. 


292 


15. 


1. 





The McGraw-Hill Companies 


Data Structures and Algorithms 


For the following data list construct a trie: 
ANT ANTELOPE BEAR BUG ELEPHANT ZEBRA BEATLE TIGER ANTEATER BISON 
MONKEY ORANGUTANG CHIMPANZEE KOALA KOEL. 
Perform the following operations on the trie: 
(i) Delete CHIMPANZEE 
(ii) Delete ANTELOPE 
(iii) Insert RHINOCEROS 
(iv) Insert MONGREL 
(v) Delete ANTEATER 


(=) Programming Assignments 


Implement a menu driven program to 

(i) construct an m-way search tree for a specific order m, 

(ii) insert elements into the m-way search tree and 
(iii) delete elements from the m-way search tree. 
Implement a menu driven demonstration of all the functions pertaining to insert, delete and 
search operation of B trees of order m. 
Implement a function to delete a key K from a trie T. Assume that each of the branch nodes 
have a COUNT field which records the number of information nodes in the sub trie for 
which it is the root. 
Implement functions to traverse a B tree of order m by inorder, preorder and post order 
traversals. 
Execute a function to gather all the information nodes beginning with a specific alphabet 
from a trie representing alphabetical keys. 


CHAPTER 


RED-BLACK 


DA TREES AND 


Q SPLAY TREES 





12.1 Red-Black Trees 
12.2 Splay Trees 
The data structures of Red-Black trees and Splay trees are discussed 12.3 Applications 
in this chapter. Red-Black trees which are special forms of binary 
search trees have their origins in B trees of order 4 but are more 
efficient than the latter, by way of performance and storage considerations. Splay trees are binary 
search trees with a self-adjusting mechanism that renders a better performance with regard to 
what is known as amortized analysis. They are more efficient when compared to their binary 
search tree or AVL tree counterparts. 


Red-Black Trees 12.1 





Introduction to red-black trees 


B trees of order m were discussed in Chapter 11. It was shown how B trees are balanced trees and 
are efficient for use in applications such as file indexing since they serve to reduce disk accesses. 
To recall, the node structure of the B trees appears as shown in Fig. 12.1. A simple implementation 
of the node may call for using sequential data structures such as arrays to hold the keys and the 
pointers to the child nodes. Thus a B tree of order m may have each of their nodes to be 
represented using two arrays of maximum dimension m and m-1 corresponding to the child 
pointers and keys respectively. This does entail wastage of space in the worst case. Also to search 
for or insert a key demands sequentially searching for the element in the nodes, to determine the 
child pointers of the nodes for the search to move down the tree, before it eventually reaches the 
key or inserts the key, as the case may be. If the order of the B tree is small, then an effective 
solution would be to maintain the keys of each of the nodes in the B tree as a binary search tree. 
However it needs to be ensured that the branches linking the nodes of the binary search tree are 
distinguished from the same linking the nodes of the B tree. 


Array of elements 
of dimension m — | 





Array of pointers 
of dimension m 


Fig. 12.1 Node structure of a B tree of order m and its representation 





The McGraw-Hill Companies 


294. Data Structures and Algorithms 


With regard to B trees of order 4 also referred to as 2-4 trees or 2-3-4 trees, it can be shown 
that a binary search tree representation of each of its nodes, yields a concept that forms the basis 
for a special kind of binary search tree called red-black tree. Consider the 2-4 tree shown in 
Fig. 12.2(a), a binary search tree representation of each of the nodes is shown in Fig. 12.2(b). Here, 
thin lines indicate branches which link the nodes of the binary search tree and thick lines the same 
between the nodes of the original B tree. Now if the thin lines and the nodes hanging from them 
were shaded light (red) and the thick lines and the nodes hanging from them were shaded grey 
(black) the resulting tree would be as shown in Fig. 12.2(c). Note that the root node is always 
shaded black. Such a tree is what is known as the red-black tree. 


A 2-4 tree: 10 | 70, 
e tit ~ 


| 80 | 90 |110| 
aa a a ae eS 
(a) 


A binary search 

tree representation 
of the nodes of the 
2-4 tree: 





Red—black tree: (10) 


red nodes 


(70) black nodes 


(c) 
Fig. 12.2 Evolving a red-black tree from a 2-4 tree 





The McGraw-Hill Companies 


Red-Black Trees and Splay Trees 295 


A node with two keys in the 2-4 tree may be represented as a binary search tree in either of 
the two ways as shown in Fig. 12.3(a). For example, the root node of the 2-4 tree [10, 70] may be 
represented in any one of the two ways as shown in Fig. 12.3(b). 





Pg EEN 
z binary 
e 
OR search tree 
" representation 
ha of the node 
Se 
A node with 2 keys Na (k) 
OON 


(a) 


Æ s 
10 | 70 7 binary 
10 | 70° y” search tree 

s OR l s : 

alj N representation 
\ of the node 
Root node of the a (70) 
2-4 tree shown 


in Fig. 12.2(a) 


(b) 
Fig. 12.3 Possible binary search tree representations of a node with two keys 


Definition 


A red-black tree is an extended binary search tree in which the nodes and the edges from which 
these nodes emanate are either red or black and satisfy the following properties: 
(i) The root node and the external nodes are always black nodes. 
(ii) [Red Condition] No two red nodes can occur consecutively on the path from the root node 
to an external node. 
(iii) [Black Condition] The number of black nodes on the path from the root node to an external 
node must be the same for all external nodes. 





The McGraw-Hill Companies 


296 Data Structures and Algorithms 


Since the colour of the node is same as the colour of the edge from which the node emanates, 
the Red and Black Conditions may be expressed in terms of the edges as well. The Red Condition 
could be alternatively defined as no two red pointers or edges can occur consecutively on a path 
from the root node to an external node. The Black Condition could be redefined as all paths from 
the root node to external nodes must have the same number of black pointers. Besides the above 
mentioned conditions, all pointers linking internal nodes with the external nodes must be black. 

The number of black nodes or edges on the path from a node to an external node is called the 
rank of the node. The rank of all external nodes is 0. 

In this chapter, all red nodes and edges will be represented using empty circles and thin lines 
and all black nodes and edges will be represented using shaded circles and thick lines 
respectively. 


Example 12.1 Figure 12.4 illustrates a red-black tree. Observe the tree to be an extended 
binary search tree with its root and external nodes to be black. The Red Condition where no two 
consecutive red nodes can occur is satisfied on all the paths from the root to the external nodes. 
Also the Black Condition where all root-to-external node paths must contain the same number of 
black nodes is also true. Every such path in the given tree contains exactly 2 black nodes. 





Fig. 12.4 An example red-black tree 


Representation of a red-black tree 


Since a red-black tree is an extended binary search tree, the kind of node representation used for 
a binary search tree may be employed for the tree as well. However since the colour of a node 
plays a dominant role in the definition of the red-black tree it is essential that the colour is also 
recorded in the node structure as a field (COLOUR). Another scheme could be to record the 
colour of the two pointers emanating from the node. 

The insert/delete operations quite often unbalance a red-black tree. The rebalancing of the tree 
may call for moving up and down the tree during the node adjustments. To facilitate this upward 
movement the node structure of a red-black tree may have a provision for a PARENT field. 
PARENT fields of nodes are pointers to their respective parent nodes. 


Searching a red-black tree 


Searching a red-black tree for a key is in no way different from the procedure used to search for 
a key in a binary search tree. Algorithm 10.1 discussed in Chapter 10 and which retrieves a key 
ITEM from a binary search tree can be employed for the same. 





The McGraw Hill Companies 


Red-Black Trees and Splay Trees 297 


Inserting into a red-black tree 


Inserting a key K into a red-black tree follows a procedure exactly similar to the one employed 
for binary search trees. The only concern now is to determine the colour to which the node must 
be set to. If the node is set to black, then the path from the root node to the external node passing 
through the node would have one more black node. This results in the violation of the Black 
Condition of a red-black tree. Hence the other alternative is to set the node to red. Now, if doing 
so leads to the violation of the Red Condition, then the red-black tree is said to be unbalanced. 
To set right the imbalance we need to undertake rotations. 

Let us suppose u is the newly inserted red node and parent_u its consecutive red node which 
is also the parent of node u. Now, u must have a grand parent, grandparent_u which is a black 
node. Based on the position of node u in relation to parent_u and grandparent_u, and the colour 
of the other child of grandparent_u, the imbalances are classified as LLb, LLr, RLb, RLr, LRb, LRr, 
RRb and RRr. Thus if u is inserted as the Left child of parent_u(L) which in turn is the Left child 
of grandparent_u (L) and the other child of grandparent_u is black (b) -the child may in fact be an 
external node which is black- then the rotation undertaken is LLb. Again if u were to be inserted 
as the Right child of parent_u ( R) which in turn is the Left child of grandparent_u (L) whose other 
child is red (r) then the rotation to be undertaken is LRr and so on. 

Imbalances of the type LLr, RLr, LRr, and RRr with ‘r as its suffix only call for a colour change 
of the nodes to set right the imbalance. On the other hand imbalances of the type LLb, LRb, RRb 
and RLb, with ‘b’ as its suffix, call for rotations to set right the imbalance. 


LLr, LRr, RRr, and RLr imbalances 


Figure 12.5 illustrates a generic representation of the LLr, LRr, RRr, and RLr imbalances and the 
colour changes that need to be undertaken to set right the imbalance. The notations L, R and r 
inscribed on the edges of the red-black trees illustrate the classification of the imbalance. 


Example 12.2 Consider the red-black tree shown in Fig. 12.6 (a), the insertion of 60 results 
in an LLr imbalance and the tree after colour change is shown in Fig. 12.6(b). For the red-black 
tree shown in Fig. 12.7(a), inserting 184 yields an RRr imbalance. The balanced tree after colour 
change is shown in Fig. 12.7(b). 

Note how the colour of the grandparent_u node changes from black to red in the generic 
representations shown in Fig. 12.5. This is so provided the grandparent_u is not the root. If 
grandparent_u turns out to be the root, then no colour change on it is done. This would therefore 
increase the number of black nodes on all paths from the root (grandparent_u) to the external 
nodes by 1. 

Also, if changing the colour of grandparent_u to red causes further imbalance up the tree, then 
we identify the category of imbalance treating grandparent_u as u and so on until the whole tree 
gets rebalanced after undertaking the appropriate rotations or colour change. 


LLb, LRb, RRb, and RLb imbalances 


Figure 12.8 illustrates the generic representations of the LLb, LRb, RRb, and RLb imbalances and 
the respective rotations to rebalance the red-black tree. The notations L, R and b inscribed on the 
edges of the red-black trees illustrate the classification of the imbalance. 


The McGraw-Hill Companies 





298 


/ 
/ 
grandparent-u 







parent-u 


inserted node u 


LLr imbalance 


/ 
/ 
grandparent-u 








parent-u 


LRr imbalance 


/ 
/ 
grandparent-u 





parent-u 


RRr imbalance 


/ 
/ 
grandparent-u 


parent-u 


£ 


Inserted node u 


RLr imbalance 


inserted node u 


inserted node u 


Data Structures and Algorithms 


g 
/ 


grandparent-u 






parent-u 


u 


LLr colour change 


/ 
/ 
grandparent-u 


parent-u 


u 


LRr colour change 
/ 
grandparent-u / 


parent-u 


RRr colour change 


/ 


grandparent-u 


parent-u 


u 


RLr colour change 


Fig. 12.5 Generic representations of LLr, LRr, RRr, and RLr imbalances and their colour change 





The McGraw-Hill Companies 





Red-Black Trees and Splay Trees 299 
Insert 60 
E 
(a) LLr imbalance after (b) LLr colour change 
insertion of 60 to balance the tree 


Fig. 12.6 An example LLr imbalance and its colour change 


RRr imbalance 
Insert 184: after insertion of 184 





(a) RRr imbalance after (b) RRr colour change 
insertion of 184 to balance the tree 


Fig. 12.7 An example RRr imbalance and its colour change 


Here u is the node which is inserted into the tree as the left or right child of parent_u which 
is the left or right child of grandparent_u. u}, parent_u’ and grandparent_u! indicate the left subtrees 
of u, parent_u and grandparent_u respectively. u®, parent_u® and grandparent_u* indicate the right 
subtrees of u, parent_u and grandparent_u respectively. It may be observed that the LLb, LRb, RRb 
and ALB rotations resemble the LZ, LR, RR and KRL rotations discussed earlier apart from the 
colour changes that are called for after the rotation. 


Example 12.3 Consider the red-black tree shown in Fig.12.9(a). Let us insert 865 and 861 
into the tree in the order given. Figure 12.9(b) shows the LRb imbalance of the tree after the 
insertion of 865. The LRb rotation rebalancing the tree is shown in Fig. 12.9(c). Insertion of 861 
into the red-black tree shown in Fig. 12.9(c) yields an LRr imbalance (Fig. 12.9(d)) which is set 
right by a colour change shown in Fig. 12.9(e). But lo! this triggers a further imbalance of the type 
RLb with the nodes 865 and 980 turning out to be consecutive red nodes! Figure 12.9(f) shows the 
RLb imbalance in the tree. The RLb rotation rebalancing the tree is shown in Fig. 12.9(g). 


The McGraw-Hill Companies 





300 


grandparent-u 








parent-u grandparent-u% 


parent-u* 


ul uk 
LLb imbalance 


grandparent-u 















parent-u 
grandparent-u* 
parent-u/ u 
ul uk 
LRb imbalance 
grandparent-u 
b R parent-u 
——— 


RRb imbalance 


grandparent-u 


R parent-u 
grandparent-u 


u parent-u* 


RLb imbalance 


Data Structures and Algorithms 


parent-u 


grandparent-u 


grandparent-u* 
uL u? parent-u* 


After LLb rotation 


u 







parent-u grandparent-u 


parent-u/ ub grandparent-u 


After LRb rotation 


parent-u 


grandparent-u 


grand- 


arent-u* 
P parent-u! ut uk 


After RRb rotation 


u 
grandparent-u 
parent-u 


grand- 
parent-u/ ul u?  parent-u* 


After RLb rotation 


Fig. 12.8 Generic representations of LLb, LRb, RRb and RLb imbalances and their rotations 





The McGraw-Hill Companies 


Red-Black Trees and Splay Trees 301 





Insert =f 


(a) A red black tree 


LRb imbalance 
ma 





(c) After LRb rotation 


n 
LRr colour 
change 





(d) Insert 861 into the red-black (e) After LRr colour change 
tree of Fig. 12.9(c) (Tree is unbalanced) 





The McGraw-Hill Companies 


302 Data Structures and Algorithms 





(f) RLb imbalance (g) After RLb rotation 
Fig. 12.9 An example LRb, RLb imbalance 


Example 12.4 In the red-black tree shown in Fig. 12.10(a), let us insert I, K, L in the order 
given. Insertion of I into the red-black tree results in an LLr imbalance which only calls for a 
colour change to rebalance the tree. Figure 12.10(b) shows the rebalanced tree after LLr colour 
change. Insertion of K does not unbalance the tree (Fig. 12.10(c)). However, insertion of L results 
in an RRr imbalance the rebalancing of which triggers an RLb imbalance. Figure 12.10(d) and 
Fig. 12.10(e) illustrate the RRr colour change and RLb rotation respectively. 


Insert / 
m = 





(a) A red-black tree (b) LLr imbalance 


Insert K 
— (By 


(c) After LLr colour change (d) No imbalance 





The McGraw-Hill Companies 


Red-Black Trees and Splay Trees 303 


Insert L 





(e) RRr imbalance (f) After RRr colour change 
(leads to ELA imbalance) 





(g) After RLb rotation 


Fig. 12.10 /nserting into a red-black tree (Example 12.1) 


Deleting from a red-black tree 


Deleting a key K from a red-black tree proceeds as one would to delete the same from a binary 
search tree. In this regard, the cases discussed in Chapter 10 in connection with the deletion of 
key K from a binary search tree, when K is a leaf node or K has a lone subtree (left subtree or right 
subtree only) or K has both left subtree and right subtree hold good here as well. However, if the 
deletion results in an imbalance in the tree then this may call for a colour change or a rotation 
if necessary. 

If the deleted node were red, then there is no way that the Black Condition would be violated 
and hence no imbalance in possible. On the other hand if the deleted node were to be black then 
there is every possibility of violation of the Black Condition due to the shortage of a black node 
in a specific root-to-external node path. In such a case the tree is said to be unbalanced. 

The imbalance is classified as Left (L) or Right (R) based on whether the deleted node v, occurs 
to the right or left of its parent node, parent_v. Again if the sibling of node v, sibling_v is a black 
node then the imbalance is further classified as Lb or Rb. If sibling_v is a red node, then the 
imbalance is classified as Lr or Rr. Based on whether sibling_v has 0 or 1 or 2 red children the Lb, 





The McGraw-Hill Companies 


304 Data Structures and Algorithms 


Rb imbalances are further sub classified as Lb0, Lb1 and Lb2, and RbO, Rb1 and Rb2 respectively. 
Similarly, the Lr, Rr imbalances are also sub classified as Lr0, Lr1 and Lr2, and RO, Rr1 and Rr2 
respectively. During rebalancing, v denotes the node that was deleted but physically replaced by 
another node which takes its place as called for by the delete process. 

We deal with imbalances concerning R in the next section. Those pertaining to L have been 
demonstrated as Illustrative Problems 12.3 and 12.4. Nodes superscripted with L indicate their 
left subtrees and those with R indicate their right subtrees. The nodes shaded grey emanating 
from thick lined edges indicate black nodes and those shaded white emanating from thin lined 
edges indicate red nodes. Nodes that are hatched indicate either red or black nodes. 


RbO, Rb1l and Rb2 imbalances 


Figure 12.11 illustrates the generic representations of RDO, Rb1 and Rb2 imbalances. The notations 
R, b and 0/1/2 inscribed on the edges of the red-black trees illustrate the classification of the 
imbalance. 

In the case of RbO imbalance the rebalancing only calls for a colour change of nodes. The 
two possibilities of RbO imbalance are shown in the figure. Rb1 imbalance is of two types 
indicated as Rb1(type 1) and Rb2 (type 2). In these, the node sibling_v has a single red child in 
either sibling_v' or w respectively. The Rb2 imbalance has sibling_v holding two red children in 
sibling_v' and w. Rotations as illustrated in the figure are performed for the Rb1 and Rb2 
imbalances. 


RrO, Rrl and Rr2 imbalances 


Figure 12.12 illustrates the generic representations of Rr0, Rr1 and Rr2 imbalances. The notations 
R, r and 0/1/2 inscribed on the edges of the red-black tree illustrate the classification of imbalance. 
Rotations are undertaken in all the three cases to rebalance the trees. Rr1 imbalance is of two 
types indicated as Rrl1(type 1) and Rrl(type 2). 


Example 12.5 A series of red-black trees, the RbO, Rb1 and Rb2 imbalances and their 
rebalancing rotations are shown in Fig. 12.13. The R, b, 0/1/2 notations are inscribed on the tree 
to help classify the kind of imbalance. 

Deleting 36 from the red-black tree shown in Fig. 12.13(a) leaves the tree violating the Black 
Condition. The imbalance is classified as RDO. The rebalancing calls for a mere colour change with 
28 set as a red node. 

Deleting 32 (Fig. 12.13(b)) leads to Rb1 (type 1) imbalance with a violation of the Black 
Condition. The rebalancing rotation pushes 28 up as a red node and changes 26 and 30 to black 
nodes thereby ensuring the satisfaction of both Red and Black Conditions. 

Deleting 59 (Fig. 12.13 (c)) is a case of Rb1(type 2) imbalance. During the rebalancing rotation, 
48 moves up to become the root of the subtree. The Red and Black Conditions are satisfied after 
rotation. 

Deleting 99 (Fig. 12.13(d)) is a case of Rb2 imbalance and the rebalancing pushes 78 up as the 
root of the subtree. The Red and Black Conditions hold good after rebalancing. 


Example 12.6 Figure 12.14 shows a series of red-black trees and the respective deletions 
which trigger the Rr0, Rri and Rr2 imbalances. The rotations illustrated are self explanatory. 


The McGraw-Hill Companies 
305 





Red-Black Trees and Splay Trees 


























parent-v parent-v parent-v 
sibling-v sibling-v sibling-v 
y y : VY 
0 we (replaced 
\ \ 
\ \ / y after 
sibling-vt _sibling-v* sibling-v’ sibling-v? sibling-v’  sibling-v? deletion) 
Rbo imbalance Rbo colour change 
parent-v sibling-v 
b R 
sibling-v E 
v sibling-v4 parent-v 
— 
x * — . v 
sibling-v’ _ sibling-v* sibling-v (replaced 
š after 
/ \ deletion) 
Rbl (Type 1) imbalance After Rbl (Type 1) rotation 
parent-v 
sibling-v oan 
y sibling-v parent-v 
fe 
i v 
sibling-v4 sibling-v! wl (replaced 
* after 
/ \ deletion) 
wh wk 
Rb1| (Type 2) imbalance After Rbl (Type 2) rotation 
parent-y 
sibling-v o 
y sibling-v parent-v 
——_—s 
i i V 
sibling-v/ sibling-v/ wl (replaced 
i after 
/ \ deletion) 


wl wk 


Rb2 imbalance 
Fig. 12.11 Generic representations of RbO, Rb1 and Rb2 imbalances and their rebalancing 


mechanisms 


After Rb2 rotation 


Example 12.7 Consider the red-black tree shown in Fig. 12.15(a). Let us delete the following 


keys from the tree: 
ee oe ee 





The McGraw-Hill Companies 


306 


parent-v 






sibling-v’ —_ sibling-v* 


Rro imbalance 


parent-v 


sibling-v 


Rr2 imbalance 
Fig. 12.12 Generic representations of Rr0, Rr1 and Rr2 imbalances and their rotations 





Data Structures and Algorithms 


sibling-v 









sibling-v4 parent-v 


v (replaced 


ə . ss ,R e 
sibling- after deletion) 


/ 
/ \ 


After Rro rotation 


Ww 









sibling-v 
parent-v 


v (replaced 


wk 
after deletion) 
\ 


sibling-v’ wt 


/ \ 


After Rrl (Type 1) rotation 








sibling-v parent-v 


sibling-v/ 


wl zL 


After Rr1 (Type 2) rotation 


N 








sibling-v parent-v 


vY 
sibling-v 


wl -L 


After Rr2 rotation 


The McGraw-Hill Companies 





Red-Black Trees and Splay Trees 


Delete 36: © 


Before deletion 





RbO imbalance 
(28) 


After R50 colour change 


Rb1 (Type 1) (342 


imbalance 


> (23) G9 


Delete 32: 












Before deletion After deletion After Rb1 (Type 1) rotation 
Delete 59: (67) 
Rb1 (Type 2) 
imbalance 
> a9 G9 
Before deletion After deletion After Rb1 (Type 2) rotation 


(c) 
Delete 99: 





Before deletion After deletion After Rb2 rotation 
(d) 


Fig. 12.13 Example Rb0, Rb1 and Rb2 imbalances 


The McGraw-Hill Companies 





308 Data Structures and Algorithms 


Delete 54: 





Rr0 imbalance 
(32) Ga) = o% @) 
Before deletion After deletion After Rr0 rotation 


(a) 


Delete 60: 


Rr1 (Type 1) D 
imbalance 


—— w SD 
G) (46 





Before deletion After deletion After Rrl (type 1) 
(b) rotation 


Delete 44: 


Rr1 (Type 2) D 
imbalance 


~> (0) G 





Before deletion After deletion After Rr1 (type 2) 
(c) rotation 





The McGraw-Hill Companies 


Red-Black Trees and Splay Trees 309 


Delete 41: 


Rr2 imbalance 


—= w BH 





Before deletion After deletion After Rr2 rotation 
(d) 
Fig. 12.14 Example Rr0, Rr1 and Rr2 rotations 


Delete F Delete S 
(D) a (D) (o) 2 (z) 


After Rb0 colour change After Rb1 (Type 2) rotation 
(a) (b) (c) 


Delete K H 
= 








Delete O Delete L 
a = 


After R50 colour change No imbalance After Rb1 (Type 1) rotation 
(d) (e) (£) 


Fig. 12.15 Deletion operations on a red-black tree (Example 12.7) 
All deletions except that of L trigger a series of Rb0/1/2 kind of imbalance. The snapshots of 


the tree after the respective rotations that were undertaken to rebalance the tree, are shown in 
Figs 12.15(b-f). Note that the deletion of L only triggers the deletion of a red node, after L has 





The McGraw-Hill Companies 


310 Data Structures and Algorithms 


been replaced by K. Therefore this deletion does not lead to an imbalance in the tree due to the 
non violation of the Black Condition. 


Example 12.8 Consider the red-black tree shown in Fig. 12.16(a). Let us delete the following 
keys from the tree: 
IVA. Tg sit 
While deletion of M does not result in an imbalance since it only causes the deletion of a red 
node, the rest of the deletions call for an Rr0/1/2 kind of imbalance. Figures 12.16(b-d) illustrate 
the snapshots of the trees after the appropriate rotations to rebalance the tree have been performed. 





red-black tree No imbalance After Rrl (Type 2) rotation 
(a) (b) (c) 


Delete H 
m 





After Rr0 rotation 
(d) 
Fig. 12.16 Deletion operations on a red-black tree (Example 12.8) 


Time complexity of search, insert and delete operations on a red-black tree 


Since the search operation on a red-black tree is similar to that on a binary search tree, the time 
complexity of the operation is O(log n). In the case of insertion or deletion, the operation may call 
for a colour change that can, in the worst case, propagate up to the root and also may call for a 
rotation to rebalance the tree. Though the colour change and rotation needs only a constant time 
(O(1)), the overall time for an insert/delete operation in the worst case would be O(log n). It can 
be shown that the height of a red-black tree is at most 2log,(n+1) and therefore all search, insert 
and delete operations that need O(h) time would have a time complexity of O(log n). 


The McGraw Hill Companies 


Red-Black Trees and Splay Trees 311 


Splay Trees 12.2 





Introduction to splay trees 


In the case of binary search trees (Sec. 10.2) it was observed that the worst case time complexity 
of the tree is O(n). Assume a case where a group of records is stored as a binary search tree. Now 
if a record were to be repeatedly accessed (m times), then the time complexity of the operation 
sequence would be O(m.n) in the worst case. In fact, studies have shown that an information or 
a node that is accessed once is likely to get accessed more often than not. What if there were to 
be a data structure, which, once a node is accessed, radically changes its shape to push the 
accessed node as the root? This adjustment though expensive the first time an access for a node 
is made, can make the repeated accesses to the node cheaper. Also, during the process of 
adjustment, where the nodes are moved around to make room for the new node, the other nodes 
which are deep down may move up making their accesses relatively cheaper as well. 

Splay trees are such data structures which provide this mechanism. These are binary search 
trees with a self adjusting mechanism which renders them remarkably efficient over a sequence 
of accesses. Nodes which are frequently accessed are moved towards the root thereby rendering 
further retrievals of the same to be efficient. Thus every time a node is accessed either for search 
or insertion, the newly accessed node is pushed towards the root. This would dislodge the other 
nodes to a position away from the root and in course of time would have the inactive nodes 
moving farther and farther away from the root. 

Unlike AVL search trees which are always height balanced, there is no guarantee that the splay 
tree would remain balanced always. In fact if the splay tree turns out to be unbalanced, then an 
access may turn out to be fairly expensive. However, over a long sequence of accesses, splay trees 
may prove to be even cheaper than AVL trees by way of the number of operations. Such an 
analysis which spreads over a sequence of operations and in which the expensive operations are 
averaged over the less expensive ones is what is called as amortized analysis. If the time complexity 
of a single access turns out to be O(n), then the amortized analysis of the access in a splay tree 
for a sequence of m operations is O(m. log n). 


Splay rotations 


An insert or search operation on a splay tree proceeds as one would on a binary search tree. 
However, after the operation is over, the tree is splayed with regard to the specific node. This 
would mean pushing the node upwards towards the root methodically following what are 
known as splay rotations. Splay rotations are more or less similar to AVL tree rotations (Sec. 10.3.3) 
and proceed bottom up from the node towards the root. 

The splay rotations are performed with regard to the specific node u, its parent parent_u and 
grandparent grandparent_u, until perhaps the root node becomes the parent of u. At that stage, 
which is the last step, the rotation involves only u and the root node. The aim of splaying is to 
move the accessed node u up by two levels at every step. To do this we track the path from the 
root to the accessed node u. Every time the path turns left we term it zig and every time it turns 
right we term it zag. Thus in the case of a single step down the tree, the path could be either a 
zig or zag. If two steps were to be considered, the path could be any one of zig-zig, zig-zag, zag- 
zig or zag- zag. Since the splaying proceeds bottom up, if the length of the path from the root to 
the accessed node u is even, then the rotations appropriate to the two step series viz., Zig-zig, zig- 
Zag, ZA-Z1Ig or zag- zag are undertaken. On the other hand if the length of the path from the root 


The McGraw-Hill Companies 





312 


Data Structures and Algorithms 


to the accessed node u is odd, then the final rotation may turn out to be the one corresponding 
to a single step series, either a zig or a zag. The rotations corresponding to the single step and two 


step series are shown in Fig. 12.17. 


Rotation Before splaying 
parent-u 
parent-u* 
Zig: 
u 
ub uk 











parent-u 









After splaying 


u 


parent-u 


u® parent-u* 









arent-u 
Zig : P 
parent-u/ 
ut uè parent-u/ u! 
u 
grandparent-u 
Zig-Zig : 
_yR 
grandparent-u pareu 
parent-u 
grandparent-u 
parent-w* 
arent-u* 
ae P grandparent-u* 
u 
Zig-Zag : grandparent-u 
parent-u grandparent-u 
parent-u 
e 
u 
grandparent-u* 
parent-u! parent-u/ ut u? grandparent-u* 





The McGraw-Hill Companies 





Red-Black Trees and Splay Trees 313 


u 













Zig-Zag : grandparent-u 
parent-u grandparent-u 
parent-u 
e 
grandparent-u* 
parent-u/ parent-u- u! u? grandparent-u* 
ut uf 
Zag-Zag : grandparent-u 







parent-u parent-u 


grandparent-u/ 
grandparent-u 


parent-u! 
parent-u/ 


uw u" grandparent-ul 


grandparent-u 










Zag-Zig : 


grandparent-u parent-u 


parent-u 
E 

parent-u* 

grandparent-u 


grandparent-u/ parent-uf 


ue u 


Fig. 12.17 Splay rotations 


It can be seen that while zig and zag represent single rotations corresponding to those of AVL 
trees, zig-zag and zag-zig represent double rotations corresponding to the same. However, zig- 
zig and zag-zag are not the same as performing two single rotations. Figure 12.18 demonstrates 


the incorrect implementation of a zig-zig with two single rotations. 





The McGraw-Hill Companies 


314 Data Structures and Algorithms 


Zig-Zig : grandparent-u grandparent-u 












grandparent-u* 
parent-u 


grandparent-u* parent-u 


R 
parent-u* parent-u 


grandparent-u* 





u? parent-u* 


Fig. 12.18 /ncorrect Zig-Zig using two single rotations 


Example 12.9 Consider the binary search tree shown in Fig. 12.19(a). Let us attempt splaying 
the tree at node 9. The path from the root to node 9 involves the path 24-12-10-9. Proceeding 
bottom-up, we perform a zig-zig on the triad, 9 (node u), 10 ( node parent_u) and 12 (node 
grandparent_u). Fig. 12.19(b) shows the tree after the first step. Now the path from the root to node 
9 involves only a single step, viz., zig. At the end of the zig rotation, the tree shown in Fig. 12.19(c) 
is obtained. 

Let us continue to splay the tree shown in Fig. 12.19 (c) at 36. The path from the root to node 
36 is given by 9-24-48-36. Proceeding bottom-up, the first step involves a zag-zig case. The 
rotation yields the tree shown in Fig. 12.19(d). Finally a zag case shows up which results in the 
splay tree shown in Fig. 12. 19(e). 


Example 12.10 Build a splay tree inserting the following elements in the sequence shown: 
AO 7,,N, 2,0 
The snapshots of the splay tree during the insertion of each of the elements is shown in Fig. 12. 20. 
Insertion of H determines the root of the splay tree (Fig. 12.20(a)). Insertion of Q calls for a zag 
rotation yielding the tree shown in Fig. 12.20(b). While insert A calls for a zig-zig case, insert N 
calls for two splay rotations viz., zag-zig and zag. Insert P calls for a zag-zig case and insert O 
calls for a zig-zag case. 


The McGraw-Hill Companies 


Red-Black Trees and Splay Trees 


(a) a binary search tree 


Splaying at 9 Q 
Splaying at 9 Tih i 


Step 2: Zi 
(Step 1: k ( aa g) (24) 


(c) 


Splaying at 36 
soit dal 2 : Zag) 





Splaying at 36 
(Step 1 : Zag-Zig) 
—~= 


(d) 
Fig. 12.19 Splaying a binary search tree 


316 


The McGraw-Hill Companies 





Data Structures and Algorithms 


Insert H Insert O 


Insert A r 
S w G 
— 
i Zig-Zig 7 
(c) 
Insert N 


Q sy wy A O 
o P 
T (Step 1 : Zag-Zig) di (0) (Step 2 : Zag) CH) 


(d) 


Insert P N P 
(4) o = G3 i 
<= 
Zag-Zig 


(e) 


Splay O 
oS U aA n) 
Zig-Zag 


Fig. 12.20 Snapshots of the splay tree (Example 12.10) 


Insert O 








The McGraw-Hill Companies 


Red-Black Trees and Splay Trees 317 


Some remarks on amortized analysis of splay trees 


Analysis of algorithms involves the computation of best, worst and average case time 
complexities based on the input instances to the algorithm. The analysis determines the work 
done by the algorithm over an instance or a specific class of instances. 

Amortized analysis is different from these analyses in the sense that it estimates the work done 
by an algorithm over a long sequence of events rather than any single or a specific class of events 
in isolation. It is the worst case performance of an algorithm over a long sequence of events. In 
fact ‘to amortize’ itself means to extinguish a debt over a long period of regular installments. 

It can be observed some times that one operation though expensive to perform for the first 
time, can lead to the same or other operations performed in a sequence thereafter, to get executed 
at a cheaper cost. Amortized analysis involves problems of such a nature. Thus amortized 
analysis is not average case analysis. While average case analysis deals with work done by the 
algorithm over a set of independent input instances, amortized analysis deals with the same over 
associated or related instances. 

In the case of splay trees, it is observed that an insert splay or a search splay after its operation 
results in the specific key moving up to become the root. Let us suppose we were searching for a 
key K in the splay tree. Though the time complexity of such an operation is O(n) in the worst case, 
a subsequent execution of the same operation after splaying would only incur a time complexity 
that is definitely much less than O(n)! In fact splay trees report an amortized time complexity of 
O(log n). Though an individual search operation for example, in a splay tree may not be O(log n) 
the amortized time complexity of m operations on a splay tree would be O(m. log n). 

During the splaying process of a binary search tree T, let T,(u) be the subtree of a node u which 
undergoes splaying in step 7. Then the rank of the node u (r,(u)) in step i of the splaying process 
is defined to be 

r(u) = log, |T;(u) | 

where | T,(u)| indicates the size of the subtree T,(u). In other words, if the subtree comprises s 
nodes, then the rank of node u is given by r(u) = log, s. The credit of a node u is given by its rank 
r(u). If u is a leaf node then r(u) = log, 1=0 and if u is the root of the tree T with n nodes then 
r(u) = log, n. Just as heights of trees act as a potential functions for the computation of their time 
complexities, ranks act as equally potential functions for the computation of amortized time 
complexities of splay trees. In fact while the heights of many nodes in the splay tree may get 
affected during a rotation operation, the ranks of the participating nodes, viz., u, parent_u and 
grandparent_u alone get affected during rotation. 

The total credit balance for a tree is the sum of all the individual credits of its nodes. That is 


Cr, = $, r, (u) where Cr, is the credit balance of the tree during the ith step of splaying the tree, 
uel; 
r(u) is the rank of the node u in the 7 th step of splaying and T; is the splayed tree in step 7. 
The amortized complexity A; of a splay step 7 is given by 
Ape ie Cr Cre .4 
where t; is the work done for the splay operation and Cr; and Cr;_, are the Credit balances of the 
tree before and after the splay operation. t; is computed as the number of levels the target node 
rises during a splay operation. In the case of zig-zig, zig-zag, zag-zag and zag-zig splay 
operations, t; is counted as 2 units whereas in the case of a simple zig or zag splay operation it 


The McGraw-Hill Companies 


318 Data Structures and Algorithms 


is counted as 1 unit. (Cr; — Cr;_,) gives the change in the credit balance after the splay operation. 
Since the ranks of the participating nodes viz., u, parent_u and grandparent_u alone change during 
a splay operation it is obvious that the credit balance (Cr; — Cr; _ ,) need be computed only with 
regard to these nodes. The rest of the summation with regard to the other nodes merely get 
cancelled. 
The following results hold good with regard to the amortized analysis of splay trees: 
(i) The amortized complexity A; of the splay tree, if step 7 of its splaying process initiates a zig- 
zig or a zag-zag step at the specific node u, satisfies the following relation: 
A; < 3. (r{u) — r; _ +(u)). 
(ii) The amortized complexity A; of the splay tree, if step 7 of its splaying process initiates a zig- 
zag or a Zag-zig step at the specific node u, satisfies the following relation: 
A; <2 - (r{u) — 7; _ 7(u)). 
(iii) The amortized complexity A; of the splay tree, if step 7 of its splaying process initiates a zig 
or a zag step at the specific node u, satisfies the following relation: 
A; < 1 + (ru) - r; _ 1). 
(iv) In a binary search tree with n nodes, the amortized cost C(n) of an insertion or search 
of a specific node with splaying does not exceed (1 + 3log, n) upward moves from the 
specific node. 
(v) In a binary search tree with not more than n nodes, the total complexity of a sequence 
of w insertions or search operations with splaying, does not exceed w - (1 + 3log, n) + log, n. 


Applications 12-3 





Red-black trees which are derived from B trees of order 4 are only a variant of binary search trees. 
Hence any application that calls for binary search trees can also call for red-black trees. 

On the other hand, splay trees which are typical binary search trees undergo splaying to favor 
retrievals which are efficient with regard to their amortized complexity. They are suitable for 
applications with the characteristic that information that is recently retrieved is highly likely to 
be retrieved in the near future. For example, in the case of a university information system, at the 
beginning of the admission season, those records pertaining to the newly admitted students are 
highly likely to be accessed over and over again in the first few weeks of their entry. In such a 
case, it would be a good move to store the records as a splay tree rather than a binary search tree. 
Since a splay tree pushes the recently retrieved records to stay closer to the root and in due course 
those records that were remotely used move farther and farther away from the root and occupy 
positions close to the fringe of the tree. Maintenance of patient records in a hospital information 
system, maintenance of records pertaining to seasonal items in a supermarket information system 
are some examples where splay trees find ideal applications. 

Splay trees have also found applications in data compression, lexicographic search trees and 
dynamic Huffman coding. 





> 





The McGraw-Hill Companies 


Red-Black Trees and Splay Trees 319 


O Summary 


Red-black trees are derived from B trees of order 4 and are variants of binary search trees. 
Red-black trees need to satisfy the Red condition which entails no two red nodes can occur 
consecutively on a path in the tree, and the Black condition which insists that the number 
of black nodes on all root-to-external node paths must be the same. 

A search operation on a red-black tree is undertaken the same way as that on a binary 
search tree. The insertion of a key in a red-black tree is similar to the one in a binary search 
tree. However, the inserted node is set to red initially to avoid violation of the Black 
condition. If this results in a violation of the Red condition as well, then the tree is said to 
be unbalanced. The imbalance is classified as XYr or XY b where X, Y may represent an L 
or R. All XYr imbalances call for a mere colour change to set right the imbalance. On the 
other hand, all XYb imbalances call for rotations to set right the imbalance. 

The deletion of a node in a red-black tree proceeds as one would in a binary search tree. 
In the case of any violation of the Black condition, the imbalances are classified as Xb0, Xb1 
and Xb2 or Xr0, Xr1 and Xr2 where X may be L or R and the appropriate rotations are 
undertaken to set right the imbalance. 

Splay trees are self-adjusting trees which are variants of binary search trees. The search and 
insert operations proceed as they would on binary search trees. However after the 
operation, the inserted key or the searched key is pushed up as the root using splay 
rotations. 

Splay rotations are classified as zig, zag, zig-zig, zig-zag, zag-zag and zag-zig rotations 
based on the position of the specific node at which the splaying is initiated. 

Though an insert or search operation on a splay tree may be expensive when undertaken 
for the first time, the same when considered over a long sequence of operations may prove 
to be efficient. Such an analysis which spreads over a sequence of operations and in which 
the expensive operations are averaged over the less expensive ones is what is called as 
amortized analysis. The amortized analysis of an access in a splay tree for a sequence of m 
operations is O(m. log n). 


© Illustrative Problems 


Problem 12.1 Construct a red-black tree inserting the following keys into an empty tree, in 
the sequence given: 


40, 16, 36, 54, 18, 7, 48, 5 


Solution: The snap shots of the red-black tree during its construction are shown in Fig. I 12.1. 
During the insertion of 54 into the tree, an RRr imbalance is encountered (Fig. I 12.1(d)). Rebalancing 
the tree calls for a colour change which affects the colour of the root (36) violating the property 
that the root of a red-black tree should be black. In such a case the colour change is made 


The McGraw-Hill Companies 





320 Data Structures and Algorithms 


such that the number of black nodes in all the paths from the root to the external nodes increases 


by 1. 
—- 20 e— 4 (40 LRA UEDA (36) 
A Q ®t 
(36) 


(a) Insert 40 (b) Insert 16 (c) Insert 36 


(36) RRr imbalance (36) 
a (16) (40) (16) Go) 


(d) Insert 54 


Q 
— G9 Ga 
ORRO (54) 


(e) Insert 18, 7 (No imbalance) 


D D 
RLb imbal 
© 740) t Tmhe ianen © 748) 
7) Qa) 59 (7) (18) (40) (34) 
Gs) 


(f) Insert 48 


The McGraw-Hill Companies 


Red-Black Trees and Splay Trees 321 


LLr imbalance 


— ee ed 





(g) Insert 5 
Fig. 1 12.1 


Problem 12.2 Build a red-black tree using the keys given below: 
PEEPUL, OLIVE, MAPLE, PINE, BANYAN, CHESTNUT 


Solution: Figure I 12.2 illustrates the snapshots of the red-black tree during its construction. 


Insert PEEPUL Insert OLIVE Insert MAPLE 


<> 
— == CED GD 
j3 r 


Insert BANYAN 





The McGraw-Hill Companies 


322 





Insert CHESTNUT 


Fig. 1 12.2 


Data Structures and Algorithms 


COLIVE> 


LRb imbalance CHESTNUD EEPUD 


GOD GAMES IND 


Problem 12.3 Obtain generic representations for the Lb0, Lb1 and Lb2 rotations on lines 
similar to that of their Rbx counterparts discussed in Sec. 12.5. 


Solution: ‘The generic representations of the rotations following similar notations and style of 


their bx counterparts are shown as follows: 


Delete node v 












parent-v 
i b 
V cane 
sibling-v —, 
x % 0 
eM. sibling-v’ 
sibling-v4 
Lb0 imbalance 
(a) 
parent-v 
E sibling-v = 
l 
/ sibling-v* 
yt v? sibling-v4 
Lb1 (type 1) imbalance 
(b) 
parent-v 
E sibling-v —ä 


sibling-v* 


L 


w wrk 


Lb1 (type 2) imbalance 
(c) 


. ae parent-v 
(replaced 
after 

deletion) sibling-v 

' N\ 

i Y sibling-v* 

sibling-v4 
After L450 colour change 


sibling-v 














parent-v 
sibling-v’ 
(replaced 
after = , 
deletion) sibling-v 
EON 
/ \ 
pl yR 
After Lb1 (type 1) rotation 
Ww 
parent-v 
sibling-v 
Y 
(replaced 
after sibling-v* 
deletion) 
/ \ 
/ \ 


After Lb1 (type 2) rotation 


The McGraw-Hill Companies 


Red-Black Trees and Splay Trees 


parent-v 







sibling-v 


sibling-v* 


wh wk 
Lb2 imbalance 


323 








parent-v 
sibling-v 


v 
(replaced 
after 
deletion) 


sibling-v* 


After Lb2 rotation 


Problem 12.4 Obtain generic representations for the Lr0, Lr1 and Lr2 rotations on lines 
similar to that of their Rrx counterparts discussed in the main text of this chapter. 


Solution: ‘The generic representations of the rotations following similar notations and style of 


their Rrx counterparts is shown as follows: 






sibling-v 


sibling-v’% 
sibling-v/ 


Lr0 imbalance 


parent-v 







sibling-v 


sibling-v* 


wl wk 


Lr1 (type 1) imbalance 


sibling-v 













parent-v 
= sibling-v* 
v 
pap aene sibling-vl 
deletion) r % 
/ \ 
pl pk 
After Lr0 rotation 
(a) 
wW 
parent-v 
sibling-v 


v 
(replaced 
after 
deletion) 


sibling-v* 


After Lrl (type 1) rotation 
(b) 





The McGraw-Hill Companies 


324 


parent-v 






yY 


sibling-v? 


(c) 


sibling-v* 


Lr2 imbalance 


(d) 


sibling-v =ý 


Data Structures and Algorithms 








parent-v 
sibling-v 


sibling-v’% 


After Lr1 (type 2) rotation 


parent-v 
sibling-v 


sibling-v* 


After Lr2 rotation 


Problem 12.5 Perform the corresponding operations on the red-black trees shown as follows: 


Delete C Delete P 


Delete J 


Solution: The snap shots of the red-black trees after the performance of the operations and after 


rebalancing are shown as follows: 





The McGraw-Hill Companies 


Red-Black Trees and Splay Trees 325 


Before deletion After deletion After rebalancing 


Delete LbhO 
C imbalance 
SS —— 


wm @®@ 
imbalance 
G@ (s) 
Lb1 (type 2) (H) 
imbalance 
© (r 


Problem 12.6 Undertake the respective delete operations on the red-black trees shown as 
follows: 






Delete J a Delete B 





The McGraw-Hill Companies 


326 Data Structures and Algorithms 


Solution: The state of the red-black trees after the delete operations and after rebalancing are 
shown as follows: 


Before deletion After deletion After rebalancing 


o> Lr1 (type 2) Oy) 
Delete J 
r 


imbalance 
m 






O Delete 8 
e 


Lr2 imbalance 
e., 





Problem 12.7 Undertake splaying of the following binary search tree at key 81. 


@ 





The McGraw-Hill Companies 


Red-Black Trees and Splay Trees 327 


Solution: The snap shots of the tree during the splay rotations are shown as follows: 


Splay at 81 
after Zig-Zag (78) after Zag-Zag (82) 


after Zig-Zig 


Problem 12.8 Build a splay tree inserting the following keys in the order shown: 
Fujiyama, Zao, Mt. Etna, Vesuvius, South Sister, Usu 


Solution: The snapshots of the splay tree during its construction is shown below: 


Insert Fujiyama: 


Insert Zao: 
Zag 


Insert Mt. Etna: 


Insert Vesuvius: 


<> 
<> 


oD 2 ea SD 


The McGraw-Hill Companies 





328 Data Structures and Algorithms 


Insert South Sister: Vesuviuy South sister 


Insert usu: Gouth sister) Ç usu > 


Problem 12.9 For the splay tree built in Illustrative Problem 12.7, trace the ranks of the 
involved nodes during the splaying steps. 


Solution: The ranks of the participating nodes viz., r(u), r(parent_u) and r(grandparent_u) are 
shown in Table I 12.9. The ranks of the other nodes during the splaying steps are also shown. 
Note how there is no change in their ranks though the heights of some of these nodes do undergo 
changes during the rotations. 


Table | 12.9 


Operation Other nodes in the tree Node u Node 
arent_u 
81 


Sep D w 
Rank of the nodes before log (6) | log ,(4)| log 5(5)| log (1) 
zig-zag operation 
Rank of the nodes after log (6) | log 2(4)| log (5) | log (3) 
zig-zag operation 


Operation Other nodes in the tree Node u Node Node 
parent_u | grandparent_u 


Sey | w| s| aj ofa | = | o 
Rank of the nodes before log (6) | log ,(1)| log ,(1)| log (3) log ,(5) 
zag-zZag operation 
Rank of the nodes after log (6) | log ,(1)| log 5(1)| log 5(5) log (1) 
zag-zag operation 


Operation Other nodes in the tree Node u Node Node 
parent_u | grandparent_u 


wep Cay >a, oja s | 0 
Rank of the nodes before log (1) | log 5(3)| log 5(1)| log 5(5) 
zig-zig operation 
Rank of the nodes after log (1) | log ,(3)| log ,(1)| log (7) 
zig-zig operation 








The McGraw-Hill Companies 


Red-Black Trees and Splay Trees 329 


Problem 12.10 Obtain the amortized complexity of each of the splay steps during the 
splaying of the tree at node 81 shown in Illustrative Problem 12.7. Make use of Table I 12.9 for 
ease of computation. 


Solution: The amortized complexity of a splay step 7 is given by A; = t; + Cr; — Cr;_ 1 where t; 
the work done is 1 unit for a zig or a zag operation and 2 units for all other operations. The 
change in the credit balance (Cr; — Cr; _ 1) is computed as the difference in sum of the ranks of 
the participating nodes. 
The amortized complexities of the three steps involved in the splaying of 81 is given below: 
Step 1 zig-zag operation: A, = 2 + ((log ,(3) + 0 + 0) — (0 + log,(2) + log (3) ) ) = 2-log ,2 =1 
It can be observed that the amortized time complexity A; of the zig-zag operation satisfies the 
relation A; < 2 (ru) — r; _ 1(U)). 
(i.e.) A,=1 
< 2 (r(81) — r; _ 1(81) = 2 (log, 3 — log, 1) = 2 log, 3. 
Step 2 zag-zag operation: A, = 2 +( (log , 5+log 53+0)-(log ,3+log ,4+log 55)) = 0 
It can be observed that the amortized time complexity A; of the zag-zag operation satisfies the 
relation A; < 3(r(u) — 7; _1 (u)) 
(i.e.) A, = 0 
< 3(r(81) - r; _ 1(81)) = 3(log, 5 — log, 3) 
Step 3 zig-zig operation: A, = 2 + ( (log , 7+log ,3+log ,1)-(log ,5+log ,6+log 57)) = 1-log 55 
It can be observed that the amortized time complexity A; of the zig-zig operation satisfies the 
relation A; < 3(r(u) — 7; _1 (u)) 
(i.e.) A,=1-log,5 
< 3(r(81) - r; _ 1(81)) = 3(log, 7 — log, 5) 


(@) Review Questions 


1. Which among the following properties does not hold good for a Red-Black tree? 
(i) the root node is always a black node. 
(ii) all external nodes are black nodes. 
(iii) two red nodes can occur consecutively on the path from the root node to an external 
node. 
(iv) the number of black nodes on the path from the root node to an external node must 
be the same for all external nodes. 
(a) (i) (b) (ii) (c) (iii) (d) (iv) 
2. Which among the following calls for a rotation to set right the imbalance? 
(i) LRb (ii) LRr (iii) RLr, (iv) RRr 
(a) (i) (b) (ii) (c) (iii) (d) (iv) 
3. In the context of deletion of a node from a red-black tree, state whether true or false: 
(i) if the deleted node were red, then the Black Condition would be violated and hence 
the tree is unbalanced. 
(ii) if the deleted node were to be black then the Black Condition is violated and hence the 
tree is unbalanced. 
(a) (i) true (ii) true(b) (i) true (ii) false 
(c) (i) false (ii) true(d) (i) false (ii) false 


330 


12. 





The McGraw-Hill Companies 


Data Structures and Algorithms 


Which among the following properties is not satisfied by a Splay tree? 
(i) splay trees are binary search trees. 
(ii) splay trees result in efficient repeated accesses. 
(iii) splay trees like AVL trees are always height balanced. 
(iv) splay trees have their frequently accessed nodes moving towards the root. 
(a) (i) (b) (ii) (o) (iii) (d) (iv) 
In the context of splay rotations, state whether true or false: 
(i) if the length of the path from the root to the accessed node u is even, then the rotations 
undertaken are associated with zig-zig, zig- zag, zag-zig or zag- zag. 
(ii) if the length of the path from the root to the accessed node u is odd, then the rotations 
undertaken are associated with either a zig or a zag. 
(a) (i) true (ii) true (b) (i) true (ii) false 
(c) (i) false (ii) true (d) (i) false (ii) false 
What are the merits of Red—Black trees over B-trees of order m? 
Outline the generic representation of an XYrimbalance, where X, Y could be either an Z or 
R. 
What is the need for Splay trees? 
How are splay rotations performed? 
What is the amortized time complexity of a search operation on a splay tree? 


. For the following list of data construct a red black tree: 


LINUX, OS2, DOS, XENIX, SOLARIS, WINDOWS, VISTA, XP, UNIX, CPM, 
Undertake the following operations on the tree: 

(i) Insert MAC (ii) Delete WINDOWS (iii) Delete UNIX. 
Represent the data list shown in Review Questions 11 (Chapter 12) as a Splay tree. Tabulate 
the number of comparisons undertaken for retrieving the following keys: 

(i) LINUX (ii) XENIX (iii) LINUX (iv) LINUX 

(v) LINUX. 


(=) Programming Assignments 


Implement a function RB_IMBALANCE(7) which given a red-black tree Twould test for the 
violation of Red and Black Conditions. 

Execute a menu driven program to insert keys into an initially empty red-black tree. Make 
use of the function RB_IMBALANCE(7) developed in Programming Assignments 1 
(Chapter 12) to test for any imbalance. Display the tree after rebalancing it. 

Implement a program to accept a non-empty red-black tree (make use of Programming 
Assignments 12 (Chapter 12)) as input and delete all its leaf nodes. Rebalance the tree after 
every deletion and display the rebalanced tree on the screen. 

Execute a program with animations and graphics to demonstrate the splaying of a tree given 
a specific node u in the tree. 

In the menu-driven program implemented in Programming Assignments 1 (Chapter 10) to 
perform the search, insert and delete operations on a binary search tree, introduce functions 
to splay the tree soon after every insert and search operation is executed. 


The McGraw-Hill Companies 


CHAPTER 


HASH TABLES 





13.1 Introduction 
13.2 Hash Table 





: Structure 
Introduction 13.1 , 
13.3 Hash Functions 
The data structures of binary search trees, AVL trees, B trees, tries, 13.4 Linear cos 
red-black trees, splay trees discussed so far in this part of the book, addressing 
are tree based data structures. These are non-linear data structures 13.5 Chaining 


and serve to capture the hierarchical relationship existing between 
the elements forming the data structure. However, there exist 
applications which deal with linear or tabular forms of data, devoid 
of any superior—subordinate relationship. In such cases employing 
these data structures would be superfluous. Hash tables are one 
among such data structures which favor efficient storage and 
retrieval of data elements which are linear in nature. 


13.6 Applications 


Dictionaries 


Dictionary is a collection of data elements uniquely identified by a field called key. A dictionary 
supports the operations of search, insert and delete. The ADT of a dictionary is defined as a set 
of elements with distinct keys supporting the operations of search, insert, delete and create 
(which creates an empty dictionary). While most dictionaries deal with distinct keyed elements, 
it is not uncommon to find applications calling for dictionaries with duplicate or repeated keys. 
In this case it is essential that the dictionary evolves rules to resolve the ambiguity that may arise 
while searching for or deleting data elements with duplicate keys. 

A dictionary supports both sequential and random access. A sequential access is one in which 
the data elements of the dictionary are ordered and accessed according to the order of the keys 
(ascending or descending, for example). A random access is one in which the data elements of 
the dictionary are not accessed according to a particular order. 

Hash tables are ideal data structures for dictionaries. In this chapter we introduce the concept 
of hashing and hash functions. The structure and operations of the hash tables are also discussed. 
The various methods of collision resolution viz., linear open addressing and chaining, and their 
performance analyses are detailed. Finally the application of hash tables in the fields of compiler 
design, relational database query processing and file organization are discussed. 





The McGraw-Hill Companies 


332 Data Structures and Algorithms 


Hash Table Structure 13.2 





A hash function H(X) is a mathematical function which given a key X of the dictionary D, maps 
it to a position P in a storage table termed hash table. The process of mapping the keys to their 
respective positions in the hash table is called hashing. Figure 13.1 illustrates a hash function. 


Hash function 





Hash table 
Fig. 13.1 Hashing a key 


When the data elements of the dictionary are to be stored in the hash table, each key X; is 
mapped to a position P; in the hash table as determined by the value of H(X}, (i.e.) P; = H(X;). 
To search for a key X in the hash table all that one does is to determine the position P by 
computing P = H(X) and access the appropriate data element. In the case of insertion of a key 
X or its deletion, the position P in the hash table where the data element needs to be inserted or 
from where it is to be deleted respectively, is determined by computing P = H(X). 

If the hash table is implemented using a sequential data structure, for example arrays, then the 
hash function H(X) may be so chosen to yield a value that corresponds to the index of the array. 
In such a case, the hash function is a mere mapping of the keys to the array indices. 


Example 13.1 Consider a set of distinct keys { AB12, VP99, RK32, -CG45, KLIB, 
OW31, ST65, EX44 } to be represented as a hash table. Let us suppose the hash function H 
is defined as below: 

H(XYmn) = ord(X) where X, Y are the alphabetical characters, 
m, n are the numerical characters of the key and ord(X) is the 
ordinal number of the alphabet X. 


The computation of the positions of the keys in the hash table is shown below: 


Key XYmn H(XYmn) Position of the key in the hash table 
AB12 orda) | o 


1 
VP99 O orv) 22 





The McGraw-Hill Companies 


Hash Tables 333 


In the Example 13.1, it was assumed that the hash function yields distinct values for the 
individual keys. If this were to be followed as a criterion, then the situation may turn out of 
control since in the case of dictionaries with very large set of data elements, the hash table size 
can be too huge to be handled efficiently. Therefore it is convenient to choose hash functions 
which yield values lying within a limited range so as to restrict the length of the table. This would 
consequently imply that the hash functions may yield identical values for a set of keys. In other 
words, a set of keys could be mapped to the same position in the hash table. Let X,, X,,...X,, be 
the n keys which are mapped to the same position P in the hash table. Then H(X,) = H(X,)=...H 
(X,) = P. In such a case, X}, X,,...X,, are called as synonyms. The act of two or more synonyms 
vying for the same position in the hash table is known as collision. Naturally, this entails a 
modification in the structure of the hash table to accommodate the synonyms. The two important 
methods of linear open addressing and chaining to handle synonyms are presented in Sec. 13.4 
and Sec. 13.5 respectively. 

The hash table accommodating the data elements appears as shown below: 





Hash Functions 13.3 


The choice of the hash function plays a significant role in the structure and performance of the 
hash table. It is therefore essential that a hash function satisfies the following characteristics: 
(i) easy and quick to compute 
(ii) even distribution of keys across the hash table. In other words, a hash function must 
minimize collisions. 


The McGraw-Hill Companies 


334 Data Structures and Algorithms 


Building hash functions 
The following are some of the methods of obtaining hash functions: 


(i) Folding: The key is first partitioned into two or three or more parts. Each of the individual 
parts are combined using any of the basic arithmetic operations such as addition or 
multiplication. The resultant number could be conveniently manipulated, for example truncated, 
to finally arrive at the index where the key is to be stored. Folding assures better spread of keys 
across the hash table. 


Example Consider a six digit numerical key: 719532. We choose to partition the key into 
three parts of two digits each, (ie.) 71 | 95 | 32, and merely add the numerical equivalent of 
each of the parts, (i.e.) 71+ 95+ 32 = 198. Truncating the result yields 98 which is chosen as the 
index of the hash table where the key 719532 is to be accommodated. 


(ii) Truncation: In this method the selective digits of the key are extracted to determine the 
index of the hash table where the key needs to be accommodated. In the case of alphabetical keys 
their numerical equivalents may be considered. Truncation though quick to compute, does not 
ensure even distribution of keys. 


Example Consider a group of six digit numerical keys that need to be accommodated in 
a hash table with 100 locations. We choose to select digits in position 3 and 6 to determine the 
index where the key is to be stored. Thus key 719532 would be stored in location 92 of the hash 
table. 


(iii) Modular Arithmetic: This is a popular method and the size of the hash table L is 
involved in the computation of the hash function. The function makes use of modulo arithmetic. 
Let k be the numerical key or the numerical equivalent if it is an alphabetical key. The hash 
function is given by 
H(k) = k mod L 
The hash function evidently returns a value that lies between 0 and L-1. Choosing L to be a 
prime number has a proven better performance by way of even distribution of keys. 


Example Consider a group of six digit numerical keys that need to be stored in a hash 
table of size 111. For a key 145682, H(k) = 145682 mod 111 = 50. Hence the key is stored in location 
50 of the hash table. 





Linear Open Addressing 13.4 


Let us suppose a group of keys are to be inserted into a hash table HT of size L, making use of 
the modulo arithmetic function H(k) = k mod L. Since the range of the hash table index is limited 
to lie between 0 and L-1, for a population of N (N>L) keys collisions are bound to occur. Hence 
a provision needs to be made in the hash table to accommodate the data elements that are 
synonyms. 

We choose to adopt a sequential data structure to accommodate the hash table. Let HT[ 0: L-1] be 
the hash table. Here the L locations of the hash table are termed as buckets. Every bucket provides 
accommodation for the data elements. However to accommodate synonyms (i.e.) keys which 
map to the same bucket, it is essential that a provision be made in the buckets. We therefore 





The McGraw-Hill Companies 


Hash Tables 335 


partition buckets into what are called slots to Hashtable < s slots ———> 
accommodate synonyms. Thus if a bucket b has s slots, 

then s synonyms can be accommodated in the bucket b. 
In the case of an array implementation of a hash table, 
the rows of the array indicate buckets and the columns 
the slots. In such a case, the hash table is represented as 
HT[0:L-1, 0:s—1]. The choice of number of slots in a 
bucket needs to be decided based on the application. 
Figure 13.2 illustrates a general hash table implemented 
using a sequential data structure. 


T 
a 





+1 L buckets ———~ 


Fig. 13.2 Hash table implemented 
using a sequential data 
structure 


Example 13.2 Let us consider a set of keys {45, 98, 
12, 55, 46, 89, 65, 88, 36, 21} to be represented as a hash 
table as shown in Fig. 13.2. Let us suppose the hash 
function H is defined as H(X) = X mod 11. The hash table therefore has 11 buckets. We propose 
3 slots per bucket. Table 13.1 shows the hash function values of the keys and Fig. 13.3 shows the 
structure of the hash table. 


Table 13.1 Hash function values of the keys (Example 13.2) 


Pex [Ss] = [a] = |e] » fe] « [wl a 


Observe how keys {45, 12, 89}, {98, 65, 21} and {55, 
88} are synonyms mapping to the same bucket 1, 10 
and 0 respectively. The provision of 3 slots per bucket 
makes it possible to accommodate synonyms. 

Now what happens if a synonym is unable to find a 
slot in the bucket? In other words, if the bucket is full, 
then where do we find place for the synonyms? In such 
a case an overflow is said to have occurred. All 
collisions need not result in overflows. But in the case 
of a hash table with single slot buckets, collisions 
mean overflows. 

The bucket to which the key is mapped by the hash 
function is known as the home bucket. To tackle 
overflows we move further down, beginning from the 
home bucket and look for the closest slot that is empty 
and place the key in it. Such a method of handling Fig. 13.3 Hash table (Example 13.2) 
overflows is known as Linear probing or Linear open 
addressing or closed hashing. 





Hash Table 
HT [0] [1] [2] 





Example 13.3 Let us proceed to insert the keys { 77, 34, 43} in the hash table discussed in 
Example 13.2. The hash function values of the keys are {0, 1, 10}. When we proceed to insert 77 
in its home bucket 0, we find a slot is available and hence the insertion is done. In the case of 34, 
its home bucket 1 is full and hence there is overflow. By linear probing, we look for the closest 
slot that is vacant and find one in the second slot of bucket 2. While inserting 43, we find bucket 





The McGraw-Hill Companies 


336 


10 to be full. The search for the closest empty slot proceeds 
by moving downwards in a circular fashion until it finds 
a vacant place in slot 3 of bucket 2. Note the circular 
movement of searching the hash table while looking for 
an empty slot. Figure 13.3 illustrates the linear probing 
method undertaken for the listed keys. The keys which 
have been accommodated in places other than their home 
buckets are shown over a grey background. 


Operations on linear open addressed hash 
tables 


Search: Searching for a key in a linear open addressed 
hash table proceeds on lines similar to that of insertion. 
However, if the searched key is available in the home 
bucket then the search is done. The time complexity in 
such a case is O(1). However, if there had been overflows 


Data Structures and Algorithms 


Hash Table 
HT [0] [1] [2] 





Fig. 13.4 Linear Open Addressing 
(Example 13.3) 


while inserting the key, then a sequential search has to be called for, which searches through each 
slot of the buckets following the home bucket, until either (i) the key is found or (ii) an empty 
slot is encountered in which case the search terminates or (iii) the search path has curled back 
to the home bucket. In the case of (i) the search is said to be successful. In the case of (ii) and 


(iii) it is said to be unsuccessful. 


Example 13.4 Consider the snapshot of the hash 
table shown in Fig. 13.5, which represents keys whose first 
character lies between ‘A’ and ‘I’, both inclusive. The hash 
function used is H(X) = ord(C) mod 10 where C is the first 
character of the alphabetical key X. The search for keys 
F18 and G64 are straightforward since they are present in 
their home buckets viz., 6 and 7 respectively. The search 
for keys A91 and F78 for example, are a trifle involved in 
the sense that, though they are available in their respective 
home buckets, they are accessed only after a sequential 
search for them is done in the slots corresponding to their 
buckets. On the other hand, the search for [99 fails to find 
it in its home bucket viz., 9. This therefore triggers a 
sequential search of every slot following the home bucket 
until the key is found, in which case the search is successful 
or until an empty slot is encountered in which case the 
search is a failure. 199 is indeed found in slot 2 of bucket 2. 
Observe how the search path curls back to the top of the 
hash table from the home bucket of key 199. Let us now 


Hash Table 





Fig. 13.5 = //lustration of search in a 
hash table 


search for the key G93. The search proceeds to look into its home bucket (7) before a sequential 
search for the same is undertaken in the slots following the home bucket. The search stops due 
to its encountering an empty slot and therefore the search is deemed unsuccessful. 





The McGraw-Hill Companies 


Hash Tables 337 


Algorithm 13.1 illustrates the search algorithm for a linear open addressed hash table. 


Algorithm 13.1: Procedure to search for a key X in a linear open addressed hash table 
procedure LOP HASH SEARCH(HT, b, s X) 


/* HT[O:b-1, O:s-1] is the hash table implemented as a two 
dimensional array. Here b is the number of buckets and s is 
the number of slots. X is the key to be searched in the hash 
table. In case of unsuccessful search, the procedure prints 
the message “KEY not found” otherwise prints “KEY found’”*/ 


He, ee oH (OX) See hast Ener One re Onmpused POnm.« a 
hy J =0;7; 7% @, J] are the indexes for the bucket and slot 
respectively */ 
while ( T[i; 7l 2# 0 and ial (ee E A do 
=f ne earen “tor a n Tene» oloto 
(7 > (s=1)) then 7 = 0; /7* reset siot index to O Co continue 
searching in the next bucket*/ 
== 0) then { i = (itl) mod Dp /* “Continue searching in 
the next bucket in a 
GlrCUlar “Manner -7 
then print ( “Key not found”); exit(); } 
endwhile 
ales are (Goel alba. | then print (“ KEY found”); 
TE E ale | = then print (~ KEY not found”); 





end LOP HASH SEARCH. 


Insert: The insertion of data elements in a linear open addressed hash table is executed as 
explained in the previous section. The hash function that is quite often modulo arithmetic based, 
determines the bucket b and thereafter slot s in which the data element is to be inserted. In the 
case of overflow, we search for the closest empty slot beginning from the home bucket and 
accommodate the key in the slot. Algorithm 13.1 could be modified to execute the insert 
operation. The line 


i£ { BP lay j] = 
if ( ATi; j] = 


0) then print (“ KEY not found”); in the algorithm is replaced by 
0) then HT[i, j] = X; /* insert X in the empty slot*/ 
Delete: The delete operation on a hash table can be clumsy. When a key is deleted it cannot 
be merely wiped off from its bucket (slot). A deletion leaves the slot vacant and if an empty slot 
is chosen as a signal to terminate a search then many of the elements following the empty slot 
and displaced from their home buckets may go unnoticed. To tackle this it is essential that the 
keys following the empty slot are moved up. This can make the whole operation clumsy. 


An alternative could be, to write a special element in the slot every time a delete operation is 
done. This special element not only serves to camouflage the empty space ‘available’ in the 
deleted slot when a search is under progress, but also serves to accommodate an insertion when 
an appropriate element assigned to the slot turns up. 

However, it is generally recommended that deletions in a hash table are avoided as much as 
possible due to their clumsy implementation. 





The McGraw-Hill Companies 


338 Data Structures and Algorithms 


Performance analysis 


The complexity of the linear open addressed hash table is dependent on the number of buckets. 
In the case of hash functions that follow modular arithmetic, the number of buckets is given by 
the divisor L. The best case time complexity of searching for a key in a hash table is given by 
O(1) and the worst case time complexity is given by O(n), where n is the number of data elements 
stored in the hash table. A worst case occurs when all the n data elements map to the same 
bucket. The time complexities when compared to that of their linear list counterparts is not in any 
way less. The best and worst case complexity of searching for an element in a linear list of n 
elements is respectively, O(1) and O(n). However, on an average the performance of the hash 
table is much more efficient than that of the linear lists. It has been shown that the average case 
performance of a linear open addressed hash table for successful and unsuccessful search is 


given by 
ie — and 
2 (1-ay 


s,-1(1+—1_), 
2 (1-a) 


where U, and S, are the number of buckets examined on an average during an unsuccessful and 
successful search respectively. The average is considered over all possible sequences of the n keys 


Xy Xz...X,„_ &is the loading factor of the hash table and is given by a= . where b is the number 


of buckets. Smaller the loading factor better is the average case performance of the hash table 
in comparison to that of linear lists. 


Other collision resolution techniques with open addressing 


The drawbacks of linear probing or linear open addressing could be overcome to an extent by 
employing one or more of the following strategies: 


(i) Rehashing A major drawback of linear probing is clustering or primary clustering wherein 
the hash table gives rise to long sequences of records with gaps in between the sequences. This 
leads to longer sequential searches especially when an empty slot needs to be found out. The 
problem could be resolved to an extent by resorting to what is known as rehashing. In this, a 
second hash function is used to determine the slot where the key is to be accommodated. If the 
slot is not empty, then another function is called for and so on. 

Thus rehashing makes use of at least two functions H, H’ where H(X), H’(X) map keys X to 
any one of the b buckets. To insert a key, H(X) is computed and the key X is accommodated in 
the bucket if it is empty. In the case of a collision, the second hash function H’(X) is computed 
and the search sequence for an empty slot proceeds by computing, 


h, = ( H(X)+ 1. H(X)) mod 6, 1=1, 2, .... 


Here h,, hy, ... is the search sequence before an empty slot is found to accommodate the key. It 
needs to be ensured that H’(X) does not evaluate to 0, since there is no way this would be of help. 
A good choice for H’(X) is given by M — (X mod M) where M is chosen to be a prime smaller 
than the hash table size (see Illustrative Problem 13.6). 


The McGraw-Hill Companies 


Hash Tables 339 


(ti) Quadratic probing This is another method that can substantially reduce clustering. In 
this method when a collision occurs at address h, unlike linear probing which probes buckets in 
locations h + 1, h + 2 ....etc., the technique probes buckets at locations h + 1, h + 4, h + 9, ... ete. 
In other words, the method probes buckets at locations (h + iĉ) mod b, i= 1, 2, ... where h is the 
home bucket and b is the number of buckets. However, there is no guarantee that the method 
gives a fair chance to probe all locations in the hash table. Though quadratic probing reduces 
primary clustering, it may result in probing the same set of alternate cells. Such a case known as 
secondary clustering occurs especially when the hash table size is not prime. 

If b is a prime number then quadratic probing probes exactly half the number of locations in 
the hash table. In this case, the method is guaranteed to find an empty slot if the hash table is at 
least half empty (see Illustrative Problems 13.4, 13.5). 


(iii) Random probing Unlike quadratic probing where the increment during probing was 
definite, random probing makes use of a random number generator to obtain the increment and 
hence the next bucket to be probed. However, it is essential that the random number generator 
function generates the same sequence. Though this method reduces clustering, it can be a little 
slow when compared to others. 


Chaining 13.5 


In the case of linear open addressing, the solution of accommodating synonyms in the closest 
empty slot may contribute to a deterioration in performance. For example, the search for a 
synonym key may involve sequentially going through every slot occurring after its home bucket 
before it is either found or unfound. Also, the implementation of the hash table using a sequential 
data structure such as arrays, limits its capacity (b x s slots). While increasing the number of slots 
to minimize overflows may lead to wastage of memory, containing the number of slots to the bare 
minimum may lead to severe overflows hampering the performance of the hash table. An 
alternative to overcome this malady is to keep all synonyms that are mapped to the same bucket 
chained to it. In other words, every bucket is maintained as a singly linked list with synonyms 
represented as nodes. The buckets continue to be represented as a sequential data structure as 
before and to favor the hash function computation. Such a method of handling overflows is called 
chaining or open hashing or separate chaining. Figure 13.6 illustrates a chained hash table. 


SYNONYMS 


BUCKETS 
Fig. 13.6 A chained hash table 





The McGraw-Hill Companies 


340 Data Structures and Algorithms 


In the Fig. 13.6, observe how the buckets have been represented sequentially and each of the 
buckets is linked to a chain of nodes which are synonyms mapping to the same bucket. 

Chained hash tables only acknowledge collisions. There are no overflows per se since any 
number of collisions can be handled provided there is enough memory to handle them! 


Example 13.5 Let us consider the set of keys {45, 98, 12, 55, 46, 89, 65, 88, 36, 21} listed in 
Example 13.2, to be represented as a chained hash table. The hash function H used is H(X) = X 
mod 11. The hash function values for the keys are as shown in Table 13.1. The structure of the 
chained hash table is as shown in Fig. 13.7. 


Petts [Hor 
E 


EE 





Fig. 13.7 Hash table (Example 13.5) 


Observe how each of the groups of synonyms viz., {45, 12, 89}, {98, 65, 21} and {55, 88} are 
represented as singly linked lists corresponding to the buckets 1, 10 and 0 respectively. In 
accordance to the norms pertaining to singly linked lists, the link field of the last synonym in 
each chain is a null pointer. Those buckets which are yet to accommodate keys are also marked 
null. 


Operations on chained hash tables 


Search: The search for a key X in a chained hash table proceeds by computing the hash 
function value H(X). The bucket corresponding to the value H(X) is accessed and a sequential 
search along the chain of nodes is undertaken. If the key is found then the search is termed 
successful other wise unsuccessful. If the chain is too long, maintaining the chain in order 
(ascending or descending) helps in rendering the search efficient. 





The McGraw Hill Companies 


Hash Tables 341 


Algorithm 13.2 illustrates the procedure to undertake search in a chained hash table. 


Algorithm 13.2: Procedure to search for a key X in a chained hash table 


procedure CHAIN HASH SEARCH (HT, b, X) 
j* AT[O:b-1] is the hash table implemented as a one 
dimensional array of pointers to buckets. Here b is the 
number of buckets. X is the key to be searched in the hash 
table. In case of unsuccessful search, the procedure prints 
the message “KEY not found” otherwise prints “KEY Tound -7 


Di — A Oe ee ene a hash sUnCe1 On COmpuEed sont. a 

TEMP = ro (ia | 8 TEMP 15s Che pointer to the firsC node In lhe chain 7 

while (DATA (TEMP) # X and TEMP # NIL ) do /* search for the key 
down the chain*/7 

TEMP = LINK(TEMP); 

endwhile 

if ( DATA(TEMP)== X) then print (“ KEY found”); 


if ( TEMP =~- NIL) then print (>° KEY noc round”); 


end CHAIN HASH SEARCH. | > 


Insert: To insert a key Xinto a hash table, we compute the hash function H(X) to determine the 
bucket. If the key is the first node to be linked to the bucket then all that it calls for, is a mere 
execution of a function to insert a node in an empty singly linked list. In the case of keys which 
are synonyms, the new key could be inserted either in the beginning or at the end of the chain 
leaving the list unordered. However, it would be prudent and less expensive too, to maintain 
each of the chains in the ascending or descending order of the keys. This would also render the 
search for a specific key amongst its synonyms to be efficiently carried out. 


Example 13.6 Let us insert keys {76, 48} into the chained hash table shown in Fig. 13.7. 
Since 76 has already three synonyms in its chain corresponding to bucket 10, we choose to insert 
it in order in the list. On the other hand 48 is the first key in its bucket viz., 4. Figure 13.8 
illustrates the insertion. 

Algorithm 13.2 could be modified to insert a key. It merely calls for the insertion of a node in 
a singly linked that is unordered or ordered. 


Delete: Unlike that of linear open addressed hash tables, the deletion of a key X in a chained 
hash table is elegantly done. All that it calls for, is a search for X in the corresponding chain and 
a deletion of the respective node. 


Performance Analysis 


The complexity of the chained hash table is dependent on the length of the chain of nodes 
corresponding to the buckets. The best case complexity of a search is O(1). A worst case occurs 
when all the n elements map to the same bucket and the length of the chain corresponding to 
that bucket is n, with the searched key turning out to be the last in the chain. The worst case 
complexity of the search in such a case is O(n). 


The McGraw-Hill Companies 


342 Data Structures and Algorithms 


5s | > ss 
a 


| 36 |S 


s 





a| s| Ls | ris 


Fig. 13.8 /nserting keys into a chained hash table 


On an average, the complexity of the search operation on a chained hash table is given by 
a (1+) 


n 





,œ&21 and 


a 
oe ~1+ > 
where U, and S, are the number of nodes examined on an average during an unsuccessful and 


successful search respectively. œ is the loading factor of the hash table and is given by œ =e 


b 


where b is the number of buckets. 
The average case performance of the chained hash table is superior to that of linear open 
addressed hash table. 


Applications 13.6 





In this section, we discuss the application of hash tables in the fields of compiler design, relational 
data base query processing and file organization. 


Representation of a keyword table in a compiler 


In Sec. 10.4, the application of binary search trees and AVL trees for the representation of symbol 
tables in a compiler were discussed. Hash tables find application in the same problem. 





The McGraw-Hill Companies 


Hash Tables 343 


A keyword table which is a static symbol table is best represented by means of a hash table. 
Each time a compiler checks out on a string to be a keyword or a user-id, the string is searched 
against the keyword table. An appropriate hash function could be designed to minimize 
collisions amongst the keywords and yield the bucket where the keyword could be found. A 
successful search indicates that the string encountered is a keyword and an unsuccessful search 
indicates it is a user-id. Considering the significant fact that but for retrievals, no insertions or 
deletions are permissible on a keyword table, hash tables turn out to be one of the best 
propositions for the representation of symbol tables. 


Example 13.7 Consider a subset of a keyword set commonly used in programming 
languages, VIZ, {whilé;, repeat; and, ‘or; not, if; else; begin; ùd, function, 
procedure, int, float, Boolean}. For simplicity we make use of the hash function H(X) = 
ord(C) —1 where C is the first character of the keyword X. Figure 13.9 illustrates a linear open 
addressed hash table with two slots per bucket (HT[0..25, 0..1]) and a chained hash table 
representation for the keyword set. Considering the efficient retrievals promoted by the hash 
table, the choice of the data structure for the symbol table representation contributes to the 
efficient performance of a compiler as well. 


Hash tabl 

HT (0.25, 0.1] ' m| - ema | 2 
C 
m wf 





eal 


(a) Linear open addressed hash table (b) Chained hash table 
Fig. 13.9 Hash table representations for a keyword set 


a 
or | 
=a onj- 





The McGraw-Hill Companies 


344 Data Structures and Algorithms 


Hash tables in the evaluation of a join operation on relational databases 


Relational data bases support a selective set of operations viz., selection, projection, join (natural 
join, equi-join) and so on, which aid query processing. Of these, the natural join operation is most 
commonly used in relational data base management systems. Indicated by the notation ><], 
the operation works on two relations (data bases) to combine them into a single relation. Given 
two relations R and S a natural join operation of the two data bases is indicated as R |>] S. 
The resulting relation is a combination of the two relations based on attributes common to the 
two relations. 


Example 13.8 Consider the two relations ITEM DESCRIPTION and VENDOR shown in 
Fig. 13.10(a). The ITEM_DESCRIPTION relation describes the items and the VENDOR relation 
contains details about the vendors supplying the items. The relation ITEM_DESCRIPTION contains 
the attributes ITEM CODE and ITEM NAME. The VENDOR relation contains the attributes 
ITEM_CODE, VENDOR _NAME, ADDRESS (city). A query pertaining to who the vendors are 
for a given item code calls for joining the two relations. The join of the two relations yields the 
relation shown in Fig. 13.10(b). Observe how the natural join operation combines the two relations 
on the basis of their common attribute ITEM_CODE. Those tuples (rows) of the two relations 
having a common attribute value in the ITEM_CODE field are “joined” together to form the output 
relation. 


Relation: ITEM DESCRIPTION Relation: VENDOR 


ITEM CODE ITEM NAME ITEM CODE VENDOR NAME ADDRESS 
P402 Pump.hp4-5-6 P402 Premier Electricals 


M636 Motor.621P M636 WheealfMecironios 
S706 Stabilizer. VA500 S706 India Electricals Kolkata 


(a) 
Relation: ITEM-DESCRIPTION >M VENDOR 


ITEM CODE ITEM NAME VENDOR NAME ADDRESS 
P402 Pump.hp4-5-6 | Premier Electricals 





M636 Motor.621P Bharath Electronics 
S706 Stabilizer. VA500 | India Electricals 


(b) 
Fig. 13.10 Natural join of two relations 





One method of evaluating a join is to use the hash method. Let H(X) be the hash function where 
X is the attribute value of the relations. Here H(X) is the address of the bucket which contains 
the attribute value and a pointer to the appropriate tuple corresponding to the attribute value. 
The pointer to the tuple is known as Tuple Identifier (TID). TIDs in general, besides containing the 
physical address of the tuple of the relation, also hold identifiers unique to the relation. The hash 
tables are referred to as hash indexes in relational data base terminology. 





The McGraw-Hill Companies 


Hash Tables 345 


A natural join of the two relations R and S over a common attribute ATTRIB, results in each 
bucket of the hash indexes recording the attribute values of ATTRIB along with the TIDs of the 
tuples in relations R and S whose R.ATTRIB = S.ATTRIB. 

When a query associated with the natural join is to be answered all that it calls for is to access 
the hash indexes to retrieve the appropriate TIDs associated with the query. Retrieving the tuples 
using the TIDs satisfies the query. 


Example 13.9 Figure 13.11(a) shows a physical view of the two relations 
ITEM_DESCRIPTION and VENDOR. Figure 13.11(b) shows the hash function values based on 
which the hash table (Fig. 13.11(c)) has been constructed. The hash function used is not discussed. 
Each bucket of the hash index records the TIDs of the attribute values mapped to the bucket. 
Thus TIDs corresponding to ITEM_CODE = P402 of both the relations, are mapped to bucket 16 


and so on. 
ITEM DESCRIPTION VENDOR 





Slots 


M636 > 4002 | M636 > 702 


[14] S706 > 4003 | S706} 7003 
P402 } 4001 | P402 } 7001 


(c) Hash table 
Fig. 13.11 Evaluation of natural join operation using hash indexes 








The McGraw-Hill Companies 


346 Data Structures and Algorithms 


Assume that a query “List the vendor(s) supplying the item P402” is to be processed. To 
process this request, we first compute H(“P402”) which as shown in Fig. 13.11(b) yields the 
bucket address 16. Accessing bucket 16 we find the TID corresponding to the relation VENDOR 
is 7001. To answer the query, all that needs to be done is to retrieve the tuple whose TID is 7001. 

A general query such as “List the vendors supplying each of the items” may call for 
sequentially searching each of the hash indexes corresponding to each attribute value of 
ITEM_CODE. 


Hash tables in a direct file organization 


File organization deals with methods and techniques to structure data in external or auxiliary storage 
devices such as tapes, disks, drums etc. A file is a collection of related data termed as records. 
Each record is uniquely identified by what is known as a key, which is a datum or a portion of 
data in the record. The major concern in all these methods is regarding the access time when 
records pertaining to the keys (primary or secondary) are to be retrieved from the storage devices 
to be updated, inserted or deleted. Some of the commonly used file organization schemes are 
sequential file organization, serial file organization, indexed sequential access file organization 
and direct file organization. Chapter 14 elaborately details on files and their methods of 
organization. 

The direct file organization (see Sec. 14.8) which is a kind of file organization method, employs 
hash tables for the efficient storage and retrieval of records from the storage devices. Given a file 
of records, { fy, fo, fzy---fy} with keys { k,, ky, k3,.k,y} a hash function H(k) where k is the record 
key, determines the storage address of each of the records in the storage device. Thus direct files 
undertake direct mapping of the keys to the storage locations of the records with the records of 
the file organized as a hash table. 


Summary 


> Hash tables are ideal data structures for dictionaries. They favor efficient storage and 
retrieval of data lists which are linear in nature. 

> A hash function is a mathematical function which maps keys to positions in the hash tables 
known as buckets. The process of mapping is called hashing. Keys which map to the same 
bucket are called as synonyms. In such a case a collision is said to have occurred. A bucket 
may be divided into slots to accommodate synonyms. When a bucket is full and a synonym 
is unable to find space in the bucket then an overflow is said to have occurred. 

> The characteristics of a hash function are that it must be easy to compute and at the same 
time minimize collisions. Folding, truncation and modular arithmetic are some of the 
commonly used hash functions. 

> A hash table could be implemented using a sequential data structure such as arrays. In 
such a case, the method of handling overflows where the closest slot that is vacant is 
utilized to accommodate the synonym key is called linear open addressing or linear 
probing. However, in course of time, linear probing can lead to the problem of clustering 
thereby deteriorating the performance of the hash table to a mere sequential search! 

> The other alternative methods of handling overflows are rehashing, quadratic probing and 
random probing. 





The McGraw-Hill Companies 


Hash Tables 347 


> A linked implementation of a hash table is known as chaining. In this all the synonyms are 
chained to their respective buckets as a singly linked list. On an average, a chained hash 
table is superior in performance when compared to that of a linear probed hash table 

> Hash tables have found applications in the design of symbol tables in compiler design, 
query processing in relational database management systems and direct file organization. 


© Illustrative Problems 


Problem 13.1 Insert the following data into a hash table implemented using linear open 
addressing. Assume the buckets to have 3 slots each. Make use of the hash function A(X) = X 
mod 9. 

{ 17, 09, 34, 56, 11, 71, 86, 55, 22, 10, 4, 39, 49, 52, 82, 13, 40, 31, 
35, 28, 44} HT [9} [1] [2] 
Solution: The linear open addressed hash table is shown in 


Fig. I 13.1 Those keys not accommodated in their home buckets 
are shown in shaded background. 


Slots 


Buckets 


Problem 13.2 For the set of keys listed in Illustrative 
Problem 13.1, trace a chained hash table making use of the same 
hash function. 


Solution: The chained hash table is shown in Fig. I 13.2. The 
nodes in the chain are inserted in the ascending order. 


[o 
[1 





buckets 
DFA 
= 
R 
- 
: 
NO 
- 
A 
D 
g 
© 


[6] 


m~ =~ 
oo ~~] 
eae b 
— U3 
~ a 
WW Nn 
N N 

| 

sih 
ae 

a 


Fig. | 13.2 





The McGraw-Hill Companies 


348 Data Structures and Algorithms 


Problem 13.3 Comment on the statement: “To minimize collisions in a linear open addressed 
hash table it is recommended that the ratio of the number of buckets in a hash table to the number 
of keys to be stored in the hash table is made bigger” 

Solution: No, this is illogical since increasing the number of buckets will only lead to wastage 
of Space. HT [0] [1] [2] 
9 E 

Problem 13.4 For the set of keys { 17, 9, 34, 56, 11, 4, 71, 86, 55, 10, 55 | 10 [82 
39, 49, 52, 82, 31, 13, 22, 35, 44, 20, 60, 28} obtain a hash table following 
quadratic probing. Make use of the hash function H(X) = X mod 9. What 
are your observations? 


Solution: Quadratic probing employs the function (h + i?) mod n, i=1, 
2, ... where n = 9, to determine the empty slot during collisions. Here h 
is the address of the home bucket given by the hash function H(X), where 
X is the key. The quadratic probed hash table is as shown in Fig. I 13.4. 
Note how during the insertion of keys 13 and 22, their home buckets 98 fails to find 
viz., 4 is full. To handle this collision, quadratic probing begins searching an empty slot 
buckets 4+1 mod 9, 4+22 mod 9, .... Since the first searched bucket 5 has 
empty slots the keys find accommodation there. However, in the case of Fig. 113.4 
key 44, to handle its collision with bucket 8, quadratic probing searches for an empty slot as 
ordered by the sequence, 8+1 mod 9, 8+22 mod 9, ... The search for an empty slot is successful 
when the bucket 8+1? mod 9 is encountered. 44 is accommodated in slot 2 of the bucket 0. 
The case of inserting key 28 is interesting, for, despite the hash table containing free slots, 
quadratic probing is unable to find an empty slot to accommodate the key. The sequence searched 
for is 1+1 mod 9, 1+22 mod 9, 1+32 mod 9.... 
An important observation regarding quadratic probing is that there is no guarantee of finding 
an empty slot in a quadratic probed hash table if the hash table size is not prime. In this example 
the hash table size is not prime. 





HT [9] [1] [2] 


Problem 13.5 For the set of keys listed in Illustrative Problem 13.4, 
obtain a hash table following quadratic probing and employing the hash 
function H(X) = X mod 11. What are your observations? 


Solution: The quadratic probed hash table for the given set of keys using 
the hash function is shown in Fig. I 13.5. 

An important observation regarding this example is that quadratic 
probing can always find an empty slot to insert a key if the hash table 
size is prime and the table is at least half empty. 





Problem 13.6 For the set of keys { 11, 55, 13, 35, 71, 52, 61, 9, 86, 31, 
49, 85, 70} obtain the hash table which employs rehashing for collision 
resolution. Assume the hash function to be H(X) = X mod 9 and the 


rehashing function to be H’(X) = 7- (X mod 7). The collision resolution function is given by h; = 
( H(X)+ i. H(X)) mod b, i=1, 2, .... 


Solution: The hash table for the problem is shown below: 
Observe how during the insertion of key 49 a collision occurs and its bucket 4 ( A(49)= 49 mod 
9 = 4) is found to be full. Rehashing turns to the next hash function H (49) = 7- (49 mod 7) to 


Fig. I 13.5 





The McGraw-Hill Companies 


Hash Tables 349 


help obtain the empty slot to accommodate the key. The slot searched is 
hl= (A(49) + 1. H (49)) mod 9 = 2. Since the bucket contains a vacant slot, key 
49 is accommodated in the slot. 

In the case of key 85 which once again collides with the keys in bucket 
4, rehashing computes H (85) = 6 and h, = (A(85) + 1. H(85)) mod 9 = 1. 
Key 85 is accommodated in bucket 1 slot 2. Finally, following similar lines, 
key 70 is accommodated in bucket 5. 


Problem 13.7 Assume a chained hash table in which each of the chain 
is implemented as a binary search tree rather than a singly linked list. 
Build such a hash table for the keys { 9, 10, 6, 20, 14, 16, 5, 40, 4, 2, 7, 3, 
8} using the hash function H(X) = X mod 5 where X is the key. What are 
the advantages of adopting this system? 





Solution: A chained hash table with each of the chains implemented as a binary search tree is 
shown in Fig. I 13.7. 

The advantage is that during the search operation, the binary search tree based chains would 
record O(log n) performance. In contrast a linear chain would report O(n) complexity on an 
average. 








The McGraw-Hill Companies 


350 Data Structures and Algorithms 


Problem 13.8 Fill in Table I 13.8(a) with the number of comparisons made, when the 
elements shown in row 1 of the table ({ 66, 100, 55, 3, 99, 144}) are either successfully or 
unsuccessfully searched over the list of elements { 66, 42, 96, 100, 3, 55, 99} when the latter is 
represented as (i) sequential list (ii) binary search tree and (iii) linear probing based hash table 
with single slot buckets using the hash function h(X)= X mod 7. 


Table | 13.8(a) 


Representation of data elements Number of Comparisons 


Hash table 





Solution: Representing the elements of the list to be searched as a sequential list, yields {3, 42, 
55, 66, 96, 99, 100}. The number of comparisons made for searching 66 is 4 and that for 144 which 
is an unsuccessful search is 7. 

Representation of the elements in the list as a binary search tree is given in Fig. I 13.8. The 
number of comparisons made for the element 66 is 1 and that for 144 is 3. 

Representation of the elements as a linear probed hash table 
with single slot bucket is shown below. The hash function used (66) 
is h(X)= X mod 7. The data element displaced from the home 
bucket is shown over grey background. The number of (42) (96) 
comparisons made for the element 66 is 1 and that for 144 is 7. 


Fig. | 13.8 





The comparisons for the rest of the elements is shown in Table I 13.8(b). 


Table | 13.8(b) 


Representation of data elements Number of Comparisons 





a ee 


10. 
11. 


12. 


13. 





The McGraw-Hill Companies 


Hash Tables 351 


(@) Review Questions 


1. 


Hash tables are ideal data structures for SESS 

(a) dictionaries (b) graphs (c) trees (d) none of these 
State whether true or false: 
In the case of linear open addressed hash table with multiple slots in a bucket, 

(i) overflows always mean collisions, and 


(ii) collisions always mean overflows 

(a) (i) true (ii) true (b) (i) true (ii) false (c) (i) false (ii) true (d) (i) false (ii) false 
In the context of building hash functions, find the odd term out in the following list: 
Folding, modular arithmetic, truncation, random probing 

(a) folding (b) modular arithmetic 

(c) truncation (d) random probing 
In the case of a chained hash table of n elements with b buckets, assuming that a worst case 
resulted in all the n elements getting mapped to the same bucket, then the worst case time 
complexity of a search on the hash table would be given by 


(a) O(1) (b) O(n/b) (c) O(n) (d) O(b) 
Match the following: 

(A) rehashing (i) collision resolution 

(B) folding (ii) hash function 

(C) linear probing 


(a) (A, G) ) (B, (ii) ) (C, (ii) ) 

(b) (A, (i) )  @, Gi) ) (C, (i) ) 

(c) (A, Gi)) — (B, @) ) (C, (i) ) 

a (A ()) BA) CA) 
What are the advantages of using modulo arithmetic for building hash functions? 
How are collisions handled in linear probing? 
How are insertions and deletions handled in a chained hash table? 
Comment on the search operation for a key K in a list L represented as (i) sequential list 
(ii) a chained hash table and (iii) linear probed hash table 
What is rehashing? How does it serve to overcome the drawbacks of linear probing? 
The following is a list of keys. Making use of a hash function h(k) = k mod 11, represent 
the keys in a linear open addressed hash table with buckets containing (i) 3 slots and (ii) 4 
slots. 
090 890 678 654 234 123 245 678 900 111 453 231 112 679 238 876 009 122 
233 344 566 677 899 909 512 612 723 823 956 221 331 441 551 
For the problem in Review Questions 11 (Chapter 13), resolve collisions by means of (i) 
rehashing that makes use of an appropriate rehashing function and (ii) quadratic probing. 
For the problem in Review Questions 11 (Chapter 13), implement a chained hash table. 





The McGraw-Hill Companies 


Data Structures and Algorithms 


(=) Programming Assignments 


. Implement a hash table using an array data structure. Design functions to handle overflows 
using (i) linear probing (ii) quadratic probing and (iii) rehashing. For a set of keys observe 
the performance when the methods listed above are executed. 

. Implement a hash table for a given set of keys using chaining method of handling 
overflows. Maintain the chains in the ascending order of the keys. Design a menu driven 
front end to perform the insert, delete and search operations on the hash table. 

. The following is a list of binary keys: 

0011, 1100, 1111, 1010, 0010, 1011, 0111, 0000, 0001, 0100, 1000, 1001, 0011. 

Design a hash function and an appropriate hash table to store and retrieve the keys 
efficiently. Compare the performance when the set is stored as a sequential list. 

. Store a dictionary of limited set of words as a hash table. Implement a spell check program 
which given an input text file will check for the spelling using the hash table based 
dictionary and in the case of misspelled words will correct the same. 

. Let TABLE_A and TABLE_B be two files implemented as a table. Design and implement a 
function JOIN (TABLE_A, TABLE_B) which will “natural join” the two files as discussed in 
Sec. 13.6. Make use of an appropriate hash function. 


The McGraw-Hill Companies 


CHAPTER 


FILE 
ORGANIZATIONS 


14.1 Introduction 








14.2 Files 
Introduction 14.1 wees 
14.4 Basic file 
One of the main components of a computer system is the memory, operations 
also referred to as the main memory or the internal memory of the 14.5 Heap or Pile 
computer. Memory is a storage repository of data that is used by organization 
the CPU during its processing. ee 
14.6 Sequential file 
When the CPU has to process voluminous data, the computer anaes a 
system has to take recourse to external memory or external storage to £ 
store the data, due to the limited capacity of the internal memory. 14.7 Indexed 
The devices which provide support for external memory are known 2e uential file 
as external storage devices or auxiliary storage devices. Given that the organization 
main memory of the computer is the primary memory, the external 14.8 Direct file 
memory is also referred to as secondary memory and the devices as organization 


secondary storage devices. Examples of secondary storage devices are 
magnetic tapes, magnetic disks, drums, floppies etc. While internal 
memory of a computer system is volatile, meaning that data may be 
lost when the power goes off, secondary memory is nonvolatile. 

Each of the secondary storage devices have their distinct characteristics. Magnetic tapes, built 
on the principle of audio tape devices, are sequential storage devices that store data sequentially. On 
the other hand, magnetic disks, drums and floppy diskettes are random access storage devices that 
can store and retrieve data both sequentially and randomly. The random access storage devices 
are also known as direct access storage devices. Section 17.2 elaborately discusses the structure and 
configuration of magnetic tapes and disks. 

The growing demands of information have called for the support of what are known as tertiary 
storage devices. Though these devices are capable of storing huge volumes of data running to 
terabytes, at lesser costs, are characterized by significantly higher read/write time when 
compared to that of secondary storage devices. Examples of tertiary storage devices are optical 
disk juke boxes, ad hoc tape storage and tape silos. 

The organization of data in the internal memory calls for an application of both sequential and 
linked data structures. In the same vein, the organization of data in the secondary memory also 
calls for a number of strategies for their efficient storage and retrieval. The organization of data 
in the secondary memory is known as files. 


The McGraw Hill Companies 


354 Data Structures and Algorithms 


In this chapter, we discuss the concept of files and their methods of organization, viz., heap or 
pile files, sequential files, indexed sequential files and direct files. 





Files 14.2 


A file is commonly thought of as a folder that holds a sheaf of related documents arranged 
according to some order. In the context of secondary storage devices, the storage and organization 
of related data is referred to as a file. In fact a file is a logical organization of data. A file is technically 
defined to be a collection of records. A record is a logical collection of fields. A field is a collection 
of characters, which can be either numeric or alphabetic or alphanumeric. A file could be a 
collection of fixed length records or variable length records, where length of a record is indicative of 
the number of characters that makes up a record. 

Let us consider the example of a student file. The file is a logical collection of student records. 
A student record is a collection of fields such as roll number, name, city, date of birth, 
grade etc. Each of these fields could be numeric, alphabetic or alphanumeric. A sample set of 
student records are shown below: 

Student file 


06Mx89 | Sukh Dev | Bangalore | 10m9%1 | B 


roll number name city date of birth grade 














A file is a logical entity and has to be mapped on to a physical medium for its storage and 
access. To facilitate storage it is essential to know the field length or field size (normally specified 
in bytes). Thus every file has its physical organization. 

For example, the student file stored on a magnetic tape would have the records listed above 
occurring sequentially as shown in Fig. 14.1. In such a case the processing of these records would 
only call for the application of sequential data structures. In fact, in the case of magnetic tapes, 
the logical organization of the records in the files and their physical organization when stored 
in the tape, are one and the same. 


Magnetic tape 


(en New Delhi | 06121980 
Allaha 


hian ad Recordt SO ee 


cg ee Record ¢+4 ge =- 4 
Fig. 14.1 Physical organization of the student file on a magnetic tape 





bad |11111980 |/ 
Record ¢+ ] S 4 










06MX89 Sukhdey 





The McGraw-Hill Companies 


File Organizations 355 


On the other hand, on a magnetic disk the student file Magnetic disk 
could be stored either sequentially or non sequentially 
(random access) as called for by the applications using the 
file. In the case of random access the records are physically 
stored in various portions of the disk where space is available. 
Figure 14.2 illustrates a snap shot of the student file storage 
in the disk. The logical organization of the records in the file 
is kept track of by physically linking the records through 
pointers. The processing of such files would call for linked 
data structures. Thus, in the case of magnetic disks, for files N 
that have been stored in a non-sequential manner, the logical | 
and the physical organizations need not coincide. 

The physical organization of the files is designed and 
ordered by the File Manager of the operating system. 


e 14. 
K y° 4.3 Fig. 14.2 Physical organization 


of the student file on 
a magnetic disk 





In a file, one or more fields could serve to uniquely identify 
the records for efficient retrieval and storage. These fields are 
known as primary keys or commonly, keys. For example, in the student file discussed above, roll 
number could be designated as the primary key for it uniquely identifies each student and hence 
the record too. 

If additional fields were added to the primary key, the combination would still continue to 
uniquely identify the record. Such a combination of fields is referred to as a super key. For 
example, the combination of roll number and name would still continue to uniquely identify 
records in the student file. A primary key can therefore be described as a minimal super key. 

It is possible to have more than one combination of fields that can serve to uniquely identify 
a record. These combinations are known as candidate keys. It now depends on the file 
administrator to choose any one combination as the primary key. In such a case, the rest of the 
combinations are called as alternate keys. For example, consider an employee file shown below. 
Here, both the fields, employee number and social security number could act as the 
primary keys since both would serve to uniquely identify the record. Thus we term them as 
candidate keys. If we chose to have employee number as the primary key then social 
security number would be referred to as alternate key. 

A field or a combination of fields that may not be a candidate key but can serve to classify 
records based on a particular characteristic are called secondary keys. For example in the employee 
file, department could be a secondary key to classify employees based on the department. 


Employee file 


M345 Abdul IN-E-765432190 Engineer 
T786 Bhagath IN-E-678902765 Officer 


employee name social security department designation 
number number 




















The McGraw-Hill Companies 


356 Data Structures and Algorithms 





Basic File Operations 14.4 


Some of the basic file operations are 
(i) open, which prepares the files concerned, for reading or writing. Commonly a file pointer 
is opened and set at the beginning of the file that is to be read or written. 
(ii) read, when the contents of the record pointed to by the file pointer, is read. 
(ili) insert, when new records are added to the file. 
(iv) delete, when existing records are removed from the file. 
(v) update, when data in one or more fields in the existing records of the files are modified. 
(vi) reset, when the file pointer is set to the beginning of the file. 
(vii) close, when the files that were opened for operations are closed for access. 

Commercial implementations of programming languages provide a variety of other file 
operations. However, from the data structure stand point the operations of insert, delete and 
update are considered significant and therefore we shall restrict our discussion to these 
operations alone. 

In the case of deletion, the operation could be executed logically or physically as determined 
by the application. In the case of physical deletion the records are physically removed from the file. 
On the other hand, in the case of logical deletion, the record is either ‘flagged’ or ‘marked’ to 
indicate that it is not in use. Every record has a bit or a byte called the deletion marker which is 
set to some value indicating deletion of the record. Though the records are physically available 
in the file, they are logically excluded from consideration during file processing. The logical 
deletion also facilitates restoration of the deleted records which could be done by “unflagging” 
or “unmarking” the records. 

In the case of the student file, addition of details pertaining to new students could call for 
insertion of appropriate student records. Students opting to drop out of the programme could 
have their records ‘logically’ or ‘physically’ deleted. Again, a change of address or a change in 
erades after revaluation, could call for updating the relevant fields of the record. 





Heap Or Pile Organization 14.5 


The heap or pile organization is one of the simplest of the file organizations. These are non-keyed 
sequential files. The records are maintained in no particular order. The insert, delete and update 
operations are undertaken as described below. This unordered file organization is basically 
suitable for instances where records are to be collected and stored for future use. 


Insert, delete and update operations 


Insert: To insert records into the heap or pile, the records are merely appended to the file. 


Delete: To delete records, either a physical deletion or logical deletion is done. In the case of 
physical deletion, either the record is physically deleted or the last record is brought forward to 
replace the deleted one. This indeed calls for several accesses. 


Update: To retrieve a record for updating it, entails a linear search of the file which in the worst 
case, could call for a search from beginning to end of the file. 


The McGraw Hill Companies 


File Organizations 357 





Sequential File Organisation 14.6 


Sequential files are ordered files maintained in a logical sequence of primary keys. The 
organization was primarily meant to satisfy the characteristics of magnetic tapes which are 
sequential devices. 


Insert, delete and update operations 


A sequential file is stored in the same logical sequence of its records, on the tape. Thus the 
physical and logical organization of sequential files are one and the same. Since random access 
is difficult on a tape, the handling of insert, delete and update operations could turn out to be 
expensive if they are handled on an individual basis. Therefore a batched mode of these 
operations is undertaken. 

For a sequential file S, those records which are to be inserted, deleted or updated are written 
on to a separate tape as a sequential file T. The file T, known as the transaction file, is ordered 
according to its primary keys. Here S is referred to as the master file. With both S and T ordered 
according to their primary keys, a maintenance program reads the two files and while 
undertaking a “merge” operation executes the relevant operation( insert / delete / update), in 
parallel. The updated file is available on an output tape. 

During the merge operation, in the case of insert operation, the new records merely get copied 
on to the output tape in the primary key ordering sequence. In the case of delete operation, the 
corresponding records are stopped from getting copied on to the output tape and are just 
skipped. For update operation, the appropriate fields are updated and the modified records are 
copied on to the output tape. Figure 14.3 illustrates the master maintenance procedure. 


Transaction 


Old master file 
> Master n 
maintenance 





program 


New master file 


Fig. 14.3 Master file maintenance 


The new file that is available on the output tape is referred to as the new master file S°°~. The 
advantage of this method is that it leaves a back up of the master file before it is updated. The 
file S at this point gets referred to as the old master file. In fact it is common to retain an ancestry 
of back up files depending on the needs of the application. In such a case, while the new master 
file would be referred to as the son file, the old master file would be referred to as the father file 
and the older master file as the grandfather file and so on. 


Making use of overflow blocks 


Since the updating of the file calls for a creation of a new file each time, an alternative could be 
to store the records in blocks with ‘holes’ in them. The ‘holes’ are vacant spaces at the tail end 
of the blocks. 


The McGraw-Hill Companies 


358 Data Structures and Algorithms 


Insertions are accommodated in these ‘holes’. If there is no space to accommodate insertions 
in the appropriate blocks, the records are accommodated in special blocks called overflow blocks. 

Although this method renders insert operations to be efficient, retrievals could call for a linear 
search of the whole file. In the case of deletions it would be prudent to adopt logical deletions. 
However, when the number of logical deletions increase or when the over flow blocks are fast 
filling up, it is advisable to reorganize the entire file. 





Indexed Sequential File Organization 14.7 


While sequential file organizations provide efficient sequential access to data, random access to 
records are quite cumbersome. Indexed sequential file organizations are hybrid organizations 
which provide efficient sequential as well as random access to data. The method of storage and 
retrieval, known as Indexed Sequential Access Method (ISAM) makes use of indexes to facilitate 
random access of data, while the data themselves are maintained in a sequential order. The files 
following the ISAM method of storage and retrieval are also known as ISAM files. 


Structure of the ISAM files 


An ISAM file consists of 

(i) a primary storage area, where the data records of the file are sequentially stored, 

(ii) a hierarchy of indexes, where an index is a directory of information pertaining to the physical 
location of the records and 

(iii) overflow area(s) or block(s), where new records to be added to the file and which could not 
be accommodated in the primary storage area, are stored. 

Though ISAM files provide efficient retrieval of records, the operations of insertion and 
deletion can get quite involved and need to be efficiently handled. There are many methods to 
maintain indexes and efficiently handle insertions and deletions in the primary storage as well 
as overflow areas. 

The primary storage area is divided into blocks where each block can store a definite number 
of records. The data records of the ISAM file are distributed on to the blocks in the logical order 
of their sequence, one block after the other. Thus all records stored in a block B, have their keys 
to be greater than or equal to those of the records stored in the previous block B, 4. 

The index is a two dimensional table with each entry indicative of the physical location of the 
records. An index entry is commonly a key - address pair, (K, BT) where K is the key of the 
record and BT is a pointer to the block containing the record or sometimes a pointer to the record 
itself. An index is always maintained in the sorted order of the keys K. If the index maintains 
an entry for each record of the file then the index is an N x 2 table where N is the size of the file. 
In such a case the index is said to be a dense index. In the case of large files, the processing time 
of dense indexes can be large due to the huge amount of entries in them. To reduce the 
processing time, one could devise strategies so that only one entry per block is made in the index. 
In such a case the index is known as a sparse index. Commonly the entry could be pertaining to 
a special record in the block known as the block anchor. The block anchor could be either the first 
record (smallest key) or the last record (largest key) of the block. If the file occupies b blocks the 
size of the index would be b x 2. 


The McGraw-Hill Companies 





File Organizations 359 


Example 14.1 Figure 14.4 illustrates a schematic diagram of a simple ISAM file. The records 
of the file are stored sequentially in the ascending order of their primary keys. The file occupies 
10 blocks each comprising 100 records. The last record of each block is chosen as the block anchor. 
Observe how the index maintains entries for each of the block anchors alone. The entries in the 
index are sorted according to the key values. 














018 
026 Block-B, 10 record/block 
Index 154 
° block 
286 
330 
Block By 
517 
2438 
a. Block-B, 10 records/block 
[3567] 





Fig. 14.4 Schematic diagram of a naive ISAM file 


Insert, Delete and Update operations for a simple ISAM file 


The insert, delete and update operations for a simple ISAM file are introduced here. However it 
needs to be recollected that a variety of methods exist for maintaining indexes, each of which 
command their exclusive methods of operations. 


Insert To insert records, the records are to be first inserted in the primary storage area. The 
existing records in a block may have to be moved to make space for the new records. This in turn 
may call for a change in the index entries especially if the block anchors get shifted due to the 
insertions. 


A simple solution would be to provide ‘holes’ in the blocks where new records could be 
inserted. However the possibility of blocks overflowing cannot be ruled out. In such a case the 
new records are pushed into the overflow area, which merely maintains the records as an 
unordered list. Another option would be to maintain an overflow area for each block, as a sorted 
linked list of records. 





The McGraw-Hill Companies 


360 Data Structures and Algorithms 


Delete The most convenient way to handle deletions is to undertake logical deletions making 
use of deletion markers. 


Update A retrieval of record for update, is quite efficiently done in an ISAM file. To retrieve 
a record with key K’, we merely undertake a linear search (or even binary search) of the index 
table to find that entry (K, BT) such that K’ < K. Following the pointer BT, we linearly search the 
block of records to retrieve the desired record. However in the case of the record being available 
in the over flow blocks, the search procedure can turn out to be a bit more complex. 

For example, in the ISAM file structure shown in Fig. 14.4, to retrieve the record with key 255, 
we merely search the index to find the appropriate entry (286, B,Î), where B,? is the pointer to 
the block B,. A linear search of the key 255 in block B, retrieves the relevant record. 


Types of indexing 


There are many methods of indexing files. Commonly, all methods of indexes make use of a 
single key field based on which the index entries are maintained. The key field is known as 
indexing field. A few of the indexing techniques are detailed here. 


Primary indexing This is one of the most common forms of indexing. The file is sorted 
according to its primary key. The primary index is a sparse index that contains the pair (K, BT) 
as its entries, where K is the primary key of the block anchor and BT is the pointer to the block 
concerned. The indexing method used in the ISAM file illustrated in Example 14.1, is in fact 
primary indexing. The general operations of insert, delete and update discussed in Sec. 14.7 hold 
good for primary indexing based ISAM files. 


Multilevel indexing 


In the case of voluminous files, despite employing sparse indexes, searching through the index 
can itself become an overhead due to the large amount of entries in the index. In such a case to 
cut down the search time, a hierarchy of indexes also known as multilevel indexes, is constructed. 
Multilevel indexes are but index over indexes. 

Example 14.2 discusses an ISAM file based on multilevel indexing. It can be seen that while 
the lowest level index points to the physical location of the blocks, the rest of the indexes point 
to their lower level indexes. To retrieve a record with key K, we begin from the highest level index 
and work our way down the indexes until the appropriate block is reached. A linear search of 
the block yields the record. 


Example 14.2 Figure 14.5 illustrates an ISAM file with multilevel indexing. The file has 
10,000 records and is stored in a sequential fashion. 400 blocks, each holding 25 records make 
up for the primary storage area. The file organization shows three levels of indexing. Observe 
how each of the higher level indexes are indexes over the lower level indexes. To search for a key 
K we begin from the highest level index and follow the pointers to the lower level indexes. At 
the lowest level index we obtain the block address from which the record could be searched out. 


Cluster indexing Typically, ISAM files have their records ordered sequentially according to 
the primary key values. It is possible that the records are ordered sequentially according to some 
non-key field that can carry duplicate values. In such a case the indexing field which is the non- 
key field is called as the clustering field and the index is called as the cluster index. 





The McGraw-Hill Companies 


File Organizations 361 
Highest Intermediate 
level index level index Lowest level index Primary storage area 
iy Address to ay Address to ' Address to 
Key Index Key Index Key | block 



















J nt m ozs | 
01100 | 00701 | — — — 00332 |  — — Ioi ol Block B, 25 Records 
01526 | 00842 | ~ | 00638 | — 0033 
01938 | 00926 | — | ooess | 

ooo | — PP —— = 

poe 8 Et I, : 










- [oer] 





e 2 8 Block B- 
03821 [|  — 03591 | — 00638) | 
0399 | 03624 |  — 
04156 | ___  — 03661 |  — 


i a 














ms i è e 
o __ 
(15592 | ~ 
15801 | — č  — 
| 16428 | ~ 





Block B, 25 Records 





+ 






07965 





hoo [75946 | 


16126] | 
2 E js Block B400 
m6428] | 


16428 


Fig. 14.5 An ISAM file based on multilevel indexing 


Cluster index is a sparse index. As all other sparse indexes, a cluster index is also made up 
of entries of the form (I, BÎ), where I is the clustering field value and BÎ is the block address 
of the appropriate record. For a group of records with the same value for I, BÎ indicates the block 
address of the first record in the group. The rest of the records in the group may be easily 
accessed by making a forward search in the block concerned, since the records in the primary 
storage area are already sorted according to the non-key field. 

However, in the case of cluster indexing, the insert / delete operations as before can become 
complex, since the data records are physically ordered. A straight forward strategy for efficient 
handling of insert / delete operations would be to maintain the block or cluster of blocks in such 
a way that all records holding the same value for their clustering field are stored in the same 
blocks or cluster of blocks. 


Example 14.3 An ISAM file based on cluster indexing is illustrated in Fig. 14.6. We consider 
the record structure of the employee file discussed in Sec. 14.3. department is used as the clustering 
field. Observe the duplicate values of the clustering field in the records. The blocks are maintained 
in such a way that records holding the same value for the clustering field are stored in the same 
block or cluster of blocks. 


Secondary indexing Secondary indexes are built on files for which some primary access 
already exists. In other words, the data records in the prime storage area are already sorted 
according to the primary key and available in blocks. The secondary indexing may be on a field 
that may be a candidate key( distinct values) or on a non-key field (duplicate values). 

In the case of secondary key field having distinct values, the index is a dense index with one 
entry (K, BT) where K is the secondary key value and BT is the block address, for every record. 
The (K, BT) entries are however ordered on the key K, to facilitate binary search on the index 
during retrievals of data. To retrieve the record with secondary key value K, the entire block of 
records pointed to by BT is transferred to the internal memory and a linear search is done to 
retrieve the record. 





The McGraw-Hill Companies 


362 Data Structures and Algorithms 


Data file 


Employee Name Soc. Security Department Designation 
number No. 


COO J oms] | Block 






Cluster index 


Clustering Pointer to 
field value Block 


Block 
pointer 





Name Soc. Security Department Designation 
No. 


O J o | Adminisration | 
OO o | administration | J | Block 








Accounts 






tration — 
e Block 
Transpor- Pron ye Name i Security Department Designation 
tation number O. 
po Administration |__| | Block 
po Administration | 
Block 
pointer 


Employee Name Soc. Security Department Designation 
number No. 


po Mining | Block 
rs es I 


Block 
pointer 





Fig. 14.6 An ISAM file based on cluster indexing 


In the case of secondary key field having duplicate values, there are various options available 
to construct a secondary index. The first is to maintain a dense index of (K, BT) pairs where K 
is the secondary key value and BT is the block address, for every record. In such a case the index 
could carry several entries of (K, BT) pairs, for the same value of K. The second option would 
be to maintain the index as consisting of variable length entries. Each index entry would be of 
the form (K, BiT, Bot , Ba; _ BT) where BT ‘s are block addresses of the various records 
holding the same value for the secondary key K. A third option is a modification of the second 
where B,T ‘s are maintained as a linked list of block pointers and the index entry is just (K, TT) 


where TT is a pointer to the linked list. 
A file could have several secondary indexes defined on it. Secondary indexes find significant 
applications in query based interfaces to data bases. 


The McGraw-Hill Companies 


File Organizations 363 


Example 14.4 Figure 14.7 illustrates a secondary indexing based file. The secondary index 
implements its entries as a tuple comprising the secondary key value and the list of block 
addresses of the records with the particular key value. The sequential file available in the primary 
storage area is already sorted on its primary key. 


Primary storage area 


Block B; 





Secondary index 


Secondary key List of block 
value addresses 


6 


Block B; 


Block B, 





Fig. 14.7 Secondary indexing of a file 





Direct File Organization 


Direct file organizations make use of techniques to directly map the key values of their records to 
their respective physical location addresses. Commonly, the techniques employ some kind of 
arithmetic or mathematical functions to bring about the mapping. The direct mapping of the keys 
with their addresses paves way for efficient retrievals and storage. 

Hash functions are prominent candidates used by direct files for bringing about the mapping 
between the keys and the addresses. Hash functions and hashing were elaborately discussed in 
Chapter 13. The application of hashing for the storage of files in the external memory is known 
as external hashing. 

Given a file of records, { R}, Ry, R3,...Ry} with keys { k,, k», k3,...kj/ a hash function H is 
employed to determine the storage address of each of the records in the storage device. Given a 
key k, H(k) yields the storage address. Unlike other file organizations where indexes are 
maintained to track the storage address area, direct files undertake direct mapping of the keys to 
the storage locations of the records. In practice the hash function H(k) yields a bucket number 
which is then mapped to the absolute block address in the disk. 





The McGraw-Hill Companies 


364 Data Structures and Algorithms 


Buckets are designed to handle collisions amongst keys. Thus a group of synonyms share the 
same bucket. In the case of overflow of a bucket, a common solution employed is to maintain 
overflow buckets with links to their original buckets. Severe overflows to the same bucket may 
call for multiple over flow buckets each linked to the other. This may however deteriorate the 
performance of the file organization during a retrieval operation. If a deletion leaves an overflow 
bucket empty, then the bucket is removed and perhaps could be inserted into a linked list of 
empty overflow buckets for future use. 


Example 14.5 Figure 14.8 illustrates the overall structure of a direct file organization. Each 
bucket records the synonym keys and the pointers to their storage locations. The storage area is 
divided into blocks which hold a group of records. Note the overflow buckets which take care 
of synonyms that overflowed from their respective buckets. 


BUCKETS STORAGE AREA 
(blocks of records) 
Key Address 


sinter’: [Ment | 
| iat | ps 
P3243 | 


Block 1 


Bucket 1 [76a |_| — 
= 


Block 2 


Bucket n | 8369 |  -— 
O a 
Ld 
a 
9469| - 


Block m 





OVERFLOW BUCKETS 








Fig. 14.8 A direct file organization 





The McGraw-Hill Companies 


File Organizations 365 





Summary 


> The internal memory or the primary memory of a computer is limited in capacity. To 
handle voluminous data, a computer system takes resort to external memory or secondary 
memory. Magnetic tapes, disks, drums are examples of secondary storage devices. 

> A file is a collection of records and a record is a collection of fields. File organizations 
are methods or strategies for the efficient storage and retrieval of data. While the 
organization of records in a file refers to its logical organization, the storage of the records 
on the secondary storage device refers to its physical organization. 

> A primary key ora key, is a field or a collection of fields that uniquely identifies a record. 
Candidate keys, super keys, secondary keys and alternate keys are other terms associated 
with the keys of a file. 

> Files support a variety of operations such as open, close, read, insert, delete, update and 
reset. 

> A heap or pile organization is a non-keyed file where records are not maintained in any 
particular order. 

> A sequential file organization maintains its records in the order of its primary keys. The 
insert, delete and update operations are carried out in a batched mode, leading to the 
creation of transaction and new master files. The operations could also be handled by 
making use of overflow blocks. 

> Indexed Sequential files offer efficient sequential and random access to its data records. 
The random access is made possible by making use of indexes. A variety of indexing based 
file organizations are possible by employing various types of indexing. Primary indexing, 
multilevel indexing, cluster indexing and secondary indexing are some of the important 
types of indexing. 

> Direct file organizations make use of techniques to map their keys to the physical storage 
addresses of the records. Hash functions are a popular choice to bring about this mapping. 


© Illustrative Problems 


Problem 14.1 The primary keys of a sample set of records are listed below. Assuming that 
the primary storage area accommodates 7 records / block and that the first record of the block is 
chosen as the block anchor, outline a schematic diagram for an ISAM file organization of the 
records built on primary indexing. 

007 024 116 244 356 359 386 451 484 496 525 584 591 614 622 646 678 785 981 
991 999 1122 1466 2468 3469 4567 8907 


Solution: The schematic diagram of the ISAM file organization based on primary indexing is 
shown in Fig. I 14.1. 


The McGraw-Hill Companies 


366 Data Structures and Algorithms 


Primary storage area 


244 Block B, 
Primary index 


Pointer to block B,t 


525 Block B> 


BT 


785 Block B3 





B,t 


3469 Block B4 





Fig. | 14.1 


Problem 14.2 For the sample set of records shown in Illustrative Problem I 14.1, design an 
ISAM file organization based on multilevel indexing for two levels of indexes. Assume a block 
size of 4 in the primary storage area and the first record of the block as the block anchor. 


Solution: The schematic diagram of the ISAM file organization based on multilevel indexing 
for two levels of indexes is shown in Fig. I 14.2. 


Problem 14.3 For the following used car file with a record structure as shown below, design 
a secondary indexing based file organization making use of the sample set of records shown in 
Table I 14.3. Here, vehicle registration number is the primary key. Assume a block size of 
2 records, in the primary storage area. Design secondary indexes using the fields (i) year of 
registration and (ii) colour. 

Used car record structure: 


Vehicle registration number year of registration 


Table | 14.3 


Vehicle registration number Year of registration 


TN4117 Pearl white Prestige 
TN4623 Silky silver Pride 
TN5724 Metallic blue Pride 


TN6234 Silky silver Sarathi 
TN7146 Metallic blue Sarathi 
TN7245 Pearl white Pride 
TN8436 Black Prestige 
TN8538 Pearl white Flight 








The McGraw-Hill Companies 


File Organizations 367 


Lowest level indexes 007 


Index /, ie Block B; 


244 


356 

359 n 

386 Block Bo 
451 


Highest level index 


Pointer to 
K 
id 
007 
678 







484 
496 
535 Block B3 


584 
591 
Ze Block B4 
Index /> 646 





678 
785 
99] 
999 


1122 
1466 Block Be 


2468 
3469 


4567 E 
8907 Block Br 





Fig. | 14.2 


Solution: ‘The schematic diagram for the secondary indexing of the used car file is shown 
in Fig. I 14.3. Both the indexes are shown in the same figure. Observe how the data records are 
ordered according to the primary key in the primary storage area. 


Problem 14.4 For the used car file discussed in Illustrative Problem I 14.3, design a 
cluster index based file organization on the non-key field year of registration. Assume that 
the blocks in the primary storage area can hold up to 2 records each. 


Solution: Figure I 14.4 illustrates the schematic diagram for the cluster index based file 
organization for the used car file. 


Problem 14.5 Design a direct file organization using a hash function, to store an item 
file with item number as its primary key. The primary keys of a sample set of records of the 
item file are listed below. Assume that the buckets can hold 2 records each and the blocks in 
the primary storage area can accommodate a maximum of 4 records each. Make use of the hash 
function h(k) = k mod 8, where k represents the numerical value of the primary key (item number). 


369 760 692. 871 659 975 981 115 620 208 821 111 554 781 181 965 
Solution: Figure I 14.5 represents the schematic diagram of the direct file organization of the 


item file using a hash function. Table 114.5 represents the hash function values of the primary 
keys. 


The McGraw-Hill Companies 


368 


Secondary index 


Secondary key 
(colour) 


wi 


Secondary key 
(year of registration) 


B,t 


Cluster index 


Clustering Pointers to 
field value block 





Pointers of blocks 


Pointers of blocks 








Data Structures and Algorithms 


Primary storage area 


TN 4117 | 1990 | Pearl white |... | 


Block B4 
TN 4623 | 1990 | Silky silver |... | 





TN 5724 | 1991 | Metallic blue |... | 
Block B> 
TN 6234 | 1994 Silky silver | ... | 
TN 7146 | 1994 | Metallic blue |... | 
Block B 
TN 7245 | 1994 | Pearlwhite |... ? 
TN 8436 | 1995 | Black |... 


Block B4 
TN 8538 | 1996 | Pearl white |... 





Fig. 1 14.3 


Block B1 





Block B2 


free pioa] 
[994 


Block B3 


Pointer to 
the next block 
in the cluster 


rinvaas[ioa] 


Block B4 





Block Bs 





Block B6 





Fig. 1 14.4 





The McGraw-Hill Companies 


File Organizations 369 


Table | 14.5 










Key, block 
address) 





Bucket ( (Key, block E P EEE E Data records 
number address) 


buckets # By 369 
HB, | 692 Block B] 
871 


Overflow buckets # Bo 975 













2 981 
115 
# B3 | 620 






965 
# B; = of block B; 
me 


Fig. 1 14.5 


5 o a [ar [oa 
cs 


Problem 14.6 For the direct file organization of the item file constructed in Illustrative 
Problem 14.5, undertake the following operations independently: 

(i) Insert item records with primary keys 441 and 805 

(ii) Delete the records with primary keys 369 and 111. 


Solution: Figure I 14.6(a) illustrates the insertion of the keys 441 and 805 into the direct file. 
Figure I 14.6(b) illustrates the deletion of the keys 369 and 111. The delete operations are undertaken 
independent of the insert operations. The affected portions of the file alone are shown in the 
figures. 

It can be observed how the deletion of the key 111 empties the overflow bucket, as a result 
of which the entire empty bucket gets removed. 


(Q) Review Questions 


1. A minimal superkey is in fact a — Ž Ž — 
(a) secondary key (b) primary key (c) non key (d) none of these 
2. State whether true or false: 
(i) A cluster index is a sparse index 
(ii) A secondary key field with distinct values yields a dense index 
(a) (i) true (ii) true (b) (i) true (ii) false (c) (i) false (ii) true (d) (i) false (ii) false 





The McGraw-Hill Companies 


370 Data Structures and Algorithms 


Main buckets 






Bucket Key, block | Key, block Primary storage area 
number address address 


# By 
Insert 441 Block B, 


Insert 805 
Fig. | 14.6(a) 
Bucket Key, block | Key, block 
number address address Primary storage area 
#21 [G 
Overflow 760 
692 
369 deleted Pucker or 
111 deleted # B3 620 
208 
821 
qD* 





= = Ck >* Logical deletion of 
the record with key k 


Fig. | 14.6(b) 


3. An index consisting of variable length entries where each index entry would be of the 
form (K, BT, B.T ; Bal _ B,T) where BT ‘s are block addresses of the various records 
holding the same value for the secondary key K can occur only in 


(a) primary indexing (b) secondary indexing 

(c) cluster indexing (d) multilevel indexing 
4. Match the following: 

(A) heap file organization (i) transaction file 

(B) sequential file organization (ii) non keyed 

(C) ISAM file organization (iii) hash function 


(D) direct file organization (iv) indexing 


17 


18. 





The McGraw-Hill Companies 


File Organizations 371 


(a) (A, (i) ) (B, (iv) ) (C, (iii) ) (D, (ii) ) 
(b) (A, (ii) ) (B, (iv) ) (C, (iii) ) (D, (i) ) 
(c) (A, (ii) ) (B, (i) ) (C, (iv) ) (D, (iii)) 
(d) (A, (iii) ) (B, (i) ) (C, (ii) ) (D, (iv) ) 
Find the odd term out in the context of basic file operations: 
open close update delete evaluate read 
(a) close (b) read (c) open (d) evaluate 
Distinguish between primary memory and secondary memory. 
Give examples for (i) superkey (ii) primary key (iii) secondary key (iv) alternate key 
How are insertions and deletions carried out in a pile? 
Distinguish between logical and physical deletion of records. 
Compare the merits and demerits of a heap file with that of a sequential file organization. 


. How do ISAM files ensure random access of data? 


What is the need for multilevel indexing in ISAM files? 
When are cluster indexes used? 
How are secondary indexes maintained? 


. What is external hashing? 


A file comprises of the following sample set of primary keys. The block size in the primary 
storage area is 2. Design an ISAM file organization based on (i) primary indexing and (ii) 
multilevel indexing ( level =3). 

090 890 678 654 234 123 245 678 900 111 453 231 112 679 238 876 311 433 
544 655 766 877 988 009 122 233 344 566 677 899 909 512 612 723 823 956 
Making use of the hash function h(k) = k mod 11, where k is the key, design a direct file 
organization for the sample file (list of primary keys) shown in Review Questions 16 
(Chapter 14). Assume that the bucket size is 3 and the block size is 4. 

Assume that the sample file (list of primary keys) shown in Review Questions 16 (Chapter 14) 
had a field called category which carries the character ‘A’ if the primary key is odd and ‘B’ 
if the primary key is even. Design a cluster index based file organization built on the field 
category. Assume a block size of 4. 


(=) Programming Assignments 


1. 


3. 


Implement the used car file discussed in Illustrative Problem 14.3 in a programming 
language of your choice that supports the data structures of files and records. Experiment 
on the basic operations of a file. What other operations does the language support to 
enhance the use of the file? Write a menu driven program to implement the operations. 
Assume that the used car file was implemented as a sequential file. Simulate the 
batched mode of updating the sequential file by creating a transaction file of insertions 
(details of cars that are brought in for sale) and deletions (cars that were sold out), to update 
the existing master file. 

A movie file has the following record structure: 


372 





The McGraw-Hill Companies 


Data Structures and Algorithms 


Assume that the name of the movie is the primary key of the file. The field type refers 
to the type of the movie viz., drama, sci-fi, horror, crime thriller, comedy etc. Input a sample 
set of records of your choice into the movie file. 

(i) Implement a primary index based ISAM file organization. 

(ii) Implement secondary indexes on director, type and production cost. 

(iii) How could the secondary index based file organization in Programming Assignment 3 
(Chapter 14) (ii) be used to answer a query such as “ Who are the directors who have 
directed films of the type comedy or drama incurring the highest production cost?” 

A company provides reimbursement of mobile phone subscription charges to its employees 
belonging to the managerial cadre and above. The following record structure captures the 
details. employee number which is designated as the primary key is a numerical 3-digit key. 
type refers to post paid or pre paid class of subscription to the mobile service. 
subscription charges refers to the charges incurred by the employee at the end of every 
month. 


For a sample set of records implement the file as 
(a) an array of records (block size = 1), and 
(b) an array of pointers to records ( assume that each pointer to record is a linked list of 
two nodes, each representing a record. In other words, each block is a linked list of 
two nodes (block size = 2)). 
Make use of an appropriate hash function to design a direct file organization for 
the saidfile. Write a menu driven program, which 
(1) inserts new records, when recruitments or promotions to the managerial cadre are 
made, 
(2) deletes records, when the employees concerned relinquish duties or terminate mobile 
usage due to various reasons and 
(3) updates records regarding the subscription charges at the end of every month, changes 
if any, in type and designation fields etc. 
Make use of a random number generator to generate a list of 500 three digit numbers. Create 
a sequential list FILE of the 500 numbers. Artificially implement storage blocks on the 
sequential list with every block containing a maximum of 10 numbers only. Open an index 
INDX over the sequential list FILE which records the highest key in each storage block and 
the address of the storage block. Implement Indexed Sequential search to look for keys K 
in FILE. Compare the number of comparisons made by the search with that of Sequential 
search for the same set of keys. 
Extend the implementation to include an index over the index INDX. 


CHAPTER 


SEARCHING 





15.1 Introduction 
15.2 Linear Search 


Introduction 15.1 15.3 Transpose 
sequential search 


Search (or Searching) is a common place occurrence in every day life. 15.4 Interpolation 
Searching for a book in the library, searching for a subscriber’s search 
telephone number in the telephone directory, searching for one’s 
name in the electoral rolls are some examples. 





15.5 Binary search 


In the discipline of computer science, the problem of search has 15.6 Fibonacci search 
assumed enormous significance. It spans a variety of applications, 15.7 Other search 
rather disciplines, beginning from searching for a key in a list of techniques 


data elements to searching for a solution to a problem in its search 
space. Innumerable problems exist where one searches for patterns 
— images, voice, text, hyper text, photographs etc., in a repository of 
data or patterns, for the solution of the problems concerned. A 
variety of search algorithms and procedures appropriate to the 
problem and the associated discipline exist in the literature. 

In this chapter we enumerate search algorithms pertaining to the problem of looking for a key 
K in a list of data elements. When the list of data elements is represented as a linear list the search 
procedures of linear search or sequential search, transpose sequential search, interpolation search, binary 
search and Fibonacci search are applicable. When the list of data elements is represented using non 
linear data structures such as binary search trees or AVL trees or B trees etc., the appropriate tree 
search techniques unique to the data structure representation may be applied. Hash tables also 
promote efficient searching. Search techniques such as breadth first search and depth first search are 
applicable on graph data structures. In the case of data representing an index of a file or a group 
of ordered elements, indexed sequential search may be employed. This chapter discusses all the 
above mentioned search procedures. 


Linear Search 15.2 


A linear search or sequential search is one where a key K is searched for, in a linear list L of data 
elements. The list L is commonly represented using a sequential data structure such as an array. 
If L is ordered then the search is said to be an ordered linear search and if L is unordered then it 
is said to be unordered linear search. 


374 





The McGraw-Hill Companies 


Data Structures and Algorithms 


Ordered linear search 


Let L={K,,K,,Kz3,...K,}, K,<K)<...K, be the list of ordered elements. To search for a key 

K in the list L, we undertake a linear search comparing K with each of the K;. So long as K > K; 

comparing K with the data elements of the list L progresses. However, if K< K;, then if K=K; 

then the search is done, other wise the search is unsuccessful implying K does not exist in the list 

L. It is easy to see how ordering the elements renders the search process to be efficient. 
Algorithm 15.1 illustrates the working of ordered linear search. 


Algorithm 15.1: Procedure for ordered linear search 


procedure LINEAR SEARCH ORDERED(L, n, K) 


ye L[O:n-1] is a linear ordered list of data elements. K 
is the key to be searched for in the list. In case of 
unsuccessful search, the procedure prints the message “KEY 
NOt round Ctherwise prints “KEY found” and returns the 
Indea 7 


i = Q; 
while (( i < n) and (K > L[i])) do esearch fOr xX down the List 
1 = l + eee 
endwhile 
if ( K — [i then | print (© “kay round”); 
return (1);} Vee Key K found: Return index I / 
else 
Printi J ChE noe Found), 
end LINEAR SEARCH ORDERED. y 


Example 15.1 Consider an ordered list of elements L[0:5]= { 16, 18, 56, 78, 90, 100}. Let us 
search for the key K = 78. Since K is greater than 16, 18, and 56, the search terminates at the fourth 
element when K < (L[3] = 78) is true. At this point, since K = L[3] = 78, the search is successfully 
done. However in the case of searching for key K = 67, the search progresses until the condition 
K < (L[3] = 78) is reached. At this point since K + L[3], we deem the search to be unsuccessful. 

Ordered linear search reports a time complexity of O(n) in the worst case and O(1) in the best 
case, in terms of key comparisons. However, it is essential that the elements are ordered before 
the search process is undertaken. 


Unordered linear search 


In this search, a key K is looked for in an unordered linear list L={K,,K,,K;,...K,} of data 
elements. The method obviously of the ‘brute force’ kind, merely calls for a sequential search 
down the list looking for the key K. 


Algorithm 15.2 illustrates the working of the unordered linear search. 


Algorithm 15.2: Procedure unordered linear search 


procedure EN Beis 2 EAR C HTUNORDEREDL n ET) 
/* LfOz:n-ij] is a linear unordered list of data elements. 
K is the key to be searched for in the list. In case 
OF JUNSUCCESShuUl Search, the procedure prints the message 
Whe TOCAT OUN OCRE Se Onlin S ew NE nOuUmO« andre Cuns 
the index sy 7 








The McGraw-Hill Companies 


Searching 375 
a= A0; 
while (( i < n) and ( L[i] Æ K)) do d search lOr X donn Che list / 
IEP T; 
endwhile 
if (LIN k) then | print (STKE found); 
return (i);} T koy K round.: PRPeCwUn Index I 25), 
else 
print rE nor sound \y; 
end LINEAR SEARCH UNORDERED. y 


Example 15.2 Consider an unordered list L[0:5] = { 23, 14, 98, 45, 67, 53} of data elements. 
Let us search for the key K = 53. Obviously the search progresses down the list comparing key 
K with each of the elements in the list until it finds it as the last element in the list. In the case 
of searching for the key K = 110, the search progresses but falls off the list thereby deeming it 
to be an unsuccessful search. 

Unordered linear search reports a worst case complexity of O(n) and a best case complexity of 
O(1) in terms of key comparisons. However, its average case performance in terms of key 
comparisons can only be inferior to that of ordered linear search. 


Transpose Sequential Search 15.3 


Also known as Self organizing sequential search, Transpose sequential search searches a list of data 
items for a key, checking itself against the data items one at a time in a sequence. If the key is 
found, then it is swapped with its predecessor and the search is termed successful. The swapping 
of the search key with its predecessor, once it is found, favours faster search when one repeatedly 
looks for the key. The more frequently one looks for a specific key in a list, the faster the retrievals 
take place in transpose sequential search, since the found key moves towards the beginning of the 
list with every retrieval operation. Thus transpose sequential search is most successful when a 
few data items are repeatedly looked for in a list. 
Algorithm 15.3 illustrates the working of transpose sequential search. 


Algorithm 15.3: Procedure for transpose sequential search 


procedure TRANS EO SETO ROUVENTITA L EAR CHL n TR) 
j/* ~GfOra-1] 2s a linear unordered “list ef data 
elements. K is the key to be searched for in the 
list. In case of unsuccessful search, the procedure 
prints the message “KEY not found” otherwise prints 
“KEY found” and swaps the key with its predecessor 
sigh ela Josie 9 7/7 


while (( 2 < fA) and (i272) + 4) ), do Jr SeCaren Tor xX down Ene Tist 
e E sp le 


endwhile 








The McGraw-Hill Companies 


376 Data Structures and Algorithms 
foes ee) theni i print I (00 UKE eer OUmels\i me, Guinc) chee mouma 7 
Siveyo | [ai iG | a= iL) po eWay yale aig 


predecessor in the list*/ 


} 


else 
print (KEY not round), 
end TRANSPOSE SEQUENTIAL SG MENS lal > 


Example 15.3 Consider an unordered list L = { 34, 21, 89, 45, 12, 90, 76, 62} of data elements. 

Let us search for the following elements in the order of their appearance: 
90. 89, 90. 21, 90; 90. 

Transpose sequential search proceeds to find each key by the usual process of checking it against 
each element of L one at a time. However, once the key is found it is swapped with its predecessor 
in the list. Table 15.1 illustrates the number of comparisons made for each key during its search. 
The list L before and after the search operation are also illustrated in the table. The swapped 
elements in the list L after the search key is found is shown in bold. Observe how the number 
of comparisons made for the retrieval of 90 which is repeatedly looked for in the search list, 
decreases with each search operation. 


Table 15.1 Jranspose sequential search of {90, 89, 90, 21, 90, 90} in the list L={ 34, 21, 89, 
45, 12, 90, 76, 62} 


Search key List L before search Number of element List L after search 
comparisons made 
during the search 


{34 21, 89, 45, 12, 90, 76, 02} { 34, 21, 89, 45, 90, 12, 76, 62} 
OAO gm5 0 T7602] { 34, 89, 21, 45, 90, 12, 76, 62} 
oSI T ia, We. Toy), { 34, 89, 21, 90, 45, 12, 76, 62} 
{ 34, 89, 21, 90, 45, 12, 76,02} { 34, 21, 89, 90, 45, 12, 76, 62} 
Po E 90 AS T 6) { 34, 21, 90, 89, 45, 12, 76, 62} 
1o42 90739115127062) (os 0A PE. oP VA P SIA 





The worst case complexity in terms of comparisons for finding a specific key in the list L is 
O(n). In the case of repeated searches for the same key the best case would be O(1). 


Interpolation Search 15.4 





Some search methods employed in every day life can be interesting. For example, when one looks 
for the word “beatitude” in the dictionary, it is quite common for one to turn over pages 
occurring at the beginning of the dictionary, and when one looks for “tranquility”, to turn over 
pages occurring towards the end of the dictionary. Also, it needs to be observed how during the 
search we turn sheaves of pages back and forth, if the word that is looked for occurs before or 
beyond the page that has just been turned. In fact, one may look askance at anybody who ‘dares’ 
to undertake sequential search to look for “beatitude” or “tranquility” in a dictionary! 





The McGraw-Hill Companies 


Searching 377 


Interpolation search is based on this principle of attempting to look for a key in a list of 
elements, by comparing the key with specific elements at “calculated” positions and ensuring if 
the key occurs “before” it or “after” it until either the key is found or not found. The list of 
elements must be ordered and we assume that they are uniformly distributed with respect to 


requests. 
Let us suppose we are searching for a key K in a list L={K,, K,, K;,...K,,}, Kı < K,<...K,, of 


numerical elements. When it is known that K lies between Kow and Kiew Ge.) Kiows K < Krign, 

then the next element that is to be probed for key comparison is chosen to be the one that lies 
(K = Kw) 

(Enen Kime) 


to be termed interpolation search. 
During the implementation of the search procedure, the next element to be probed in a sublist 


{Kis Kiti; Kj42,.-.K;} for comparison against the key K is given by Kig where mid is given by 


of the way between Kow and Khigy It is this consideration that has made the search 


K-K; 
mid=i+(j—i): rae . The key comparison results in any one of the following cases: 
TN 
If (K = Kig) then the search is done. 


If (K < Kpnig) then continue the search in the sublist {K;, Kay Kaz --- Kniga 
If (K > K,,;q) then continue the search in the sublist {K migr; Kmidrz Kiar ++» Ki) 
Algorithm 15.4 illustrates the interpolation search procedure. 


Algorithm 15.4: Procedure for interpolation search 


procedure INTERPOLATION SEARCH (1, ipo Ve) 
~ L[l:n] is a linear ordered list of data elements. 


K is the key to be searched for in the list. In case 
of unsuccessful search, the procedure prints the message 
Nite een @ anol G « Oise ays mmo Tota Sh a ONC mena, 


iL ies 
Se 
af ( Ke Sila) or (kh > Lo then {| print ( key not found), exit( ) >) 
/* if the key K does not lie within the 
list then found = false; print ) key not round ~/ 
while (( i < j) and (found = false))do 
l Ce 
mid=i+(j-i) Te 
LS ela) 
case 
K = Limmel e { found = true; print (“Key found”); } 
K < Liimid] < 7° = mid=1> 
K > Limia] s A EE 
endcase 
endwhile 
if ( found = false) then print (“ Key not found”); 


akey R noc oud ana e e n 7 


end INTERPOLATION SEARCH. d 





The McGraw-Hill Companies 


378 Data Structures and Algorithms 


Example 15.4 Consider a list ZL = {24, 56, 67, 78, 79, 89, 90, 95, 99} of ordered numerical 
elements. Let us search for the set of keys { 67, 45}. Table 15.2 illustrates the trace of Algorithm 
15.4 for the set of keys. The algorithm proceeds with its search since both the keys lie within the 
list. 


Table 15.2 Trace of Algorithm 15.4 during the search for keys 67 and 45 


Search key K found < 
K = L{mid] 


67 < (L[5] = 79) 


(L[3]=67) 
Key found 


13911): ae =|3.24|=3 45 < (L[3]=67) 


1 45 > (L[1]=24) 
2 45 < (L[2]=56) 


Key not found 





In the case of key 67 the search was successful. However in the case of key 45, as can be observed 
in the last row of the table, the condition (i < f) (i.e.) (2 < 1) failed in the while loop of the 
algorithm and hence the search terminates signaling “key not found”. 

The worst case complexity of interpolation search is O(n). However, on an average the search 
records a brilliant O(log, log, n) complexity. 


Binary Search 15.5 





In the previous section we discussed interpolation search which works on ordered lists and 
reports an average case complexity of O(log, log, n). Another efficient search technique that 
operates on ordered lists is the binary search also known as logarithmic search or bisection. 

A binary search searches for a key K in an ordered list L={K,, Kz, K3, -K p} K, < K, <... Kpọ 
of data elements, by halving the search list with each comparison until the key is either found or 
not found. The key K is first compared with the median element of the list viz., K„;g- For a sublist 
(Kj, Kimis Kise Kyl, Kmig is obtained as the key occurring at the position mid which is 


(+J) 
2 





computed as mid -| | The comparison of K with K,,,, yields the following cases: 


If (K= K,,,,) then the binary search is done. 
If (K< K,,,,) then continue binary search in the sub list {K;,Kj41, Kiso,» Kyjig4} 
If (K > K,,,4) then continue binary search in the sub list {K nig+17 Kmid+27 Kmid+37 = Kj} 





The McGraw-Hill Companies 


Searching 379 


During the search process, each comparison of key K with K,,,, of the respective sub lists 
results in the halving of the list. In other words with each comparison the search space is reduced 
to half its original length. It is this characteristic that renders the search process to be efficient. 
Contrast this with a sequential list where the entire list is involved in the search! 

Binary search adopts the Divide-and-Conquer method of algorithm design. Divide-and-Conquer 
is an algorithm design technique where to solve a given problem, the problem is first recursively 
divided (Divide) into sub-problems (smaller problem instances). The sub-problems that are small 
enough are easily solved (Conquer) and the solutions combined to obtain the solution to the whole 
problem. Divide-and-Conquer has turned out to be a successful algorithm design technique with 
regard to many problems. In the case of binary search, the divide-and-conquer aspect of the 
technique breaks the list (problem) into two sub lists (sub-problems). However, the key is 
searched for only in one of the sublists hence with every division a portion of the list gets 
discounted. 

Algorithm 15.5 illustrates a recursive procedure for binary search. 


Decision tree for binary search 


The binary search for a key K in the ordered list L={K,, Kz, K3,... Ka} Kı < K, <... K, traces a 
binary decision tree. Figure 15.1 illustrates the decision tree for n = 15. The first element to be 
compared with K in the list L ={Kj, Ky, K3,...Ky5} is Kg which becomes the root of the decision 
tree. If K < Kg then the next element to be compared is K, which is the left child of the decision 
tree. For the other cases of comparisons it is easy to trace the tree by making use of the following 
characteristics: 
(i) the indexes of the left and the right child nodes differ by the same amount from that of the 
parent node. 
For example, in the decision tree shown in Fig. 15.1 the left and right child nodes of the 
node K>, viz., Kjo and K}; differ from their parent key index by the same amount. 
This characteristic renders the search process to be uniform and therefore binary search is also 
termed as uniform binary search. 
(ii) for n elements where n = 2! — 1, the difference in the indexes of a parent node and its child 
nodes follows the sequence 2°, 2!, 27: from the leaf upwards. 
For example, in Fig. 15.1 where n = 15 = 24-1, the difference in index of all the leaf nodes 
from their respective parent nodes is 2°. The difference in index of all the nodes in level 
3 from their respective parent nodes is 2! and so on. 


Level | (Ks) a |22 
| 
l 
| 
y 
pamela (Ks) ae er 
| 
E oa 
Level 3 a) (Ks) Kio) Kia) Å 20 
| 
Level K) (K3) (Ks) (Kr) (Ko) K) KY K3----t-- 


Fig. 15.1 Decision tree for binary search 





The McGraw-Hill Companies 


Data Structures and Algorithms 


380 


Example 15.5 Consider an ordered list L = {K,, K3, K3,...K,5}= { 12, 21, 34, 38, 45, 49, 67, 
69, 78, 79, 82, 87, 93, 97, 99}. Let us search for the key K = 21 in the list L. The search process is 


illustrated in Fig. 15.2. K is first compared with K,,., = Ka ‘a6 | = Kg = 69 . Since K < K „ig; the search 


2 
continues in the sublist {12, 21, 34, 38, 45, 49, 67}. Now, K is compared with Kig = Ka m | = K,=38. 


2 
Again K < K,,,,, shrinks the search list to {12, 21, 34}. Now finally when K is compared with 


Kni = “us | = K, = 21, the search is done. Thus in three comparisons we are able to search for 
2 
the key K =21. 


Search Key K: 21 


12 21 34 38 45 49 67 69 78 79 82 87 93 97 99 


12345 6 7 8 9 10 1112.13 1415 


j 


K mid 


J K < K mid 
12 21 34 38 45 49 67 


123 45 67 


K mid 


| K< K mid 
12 21 34 


I 2 3 


K mid 
K= K mid Search Successful 


Fig. 15.2 Binary search process (Example 15.5) 


Let us now search for the key K = 75. Proceeding in a similar manner, K is first compared with 
K, = 69. Since K > Kg, the search list is reduced to { 78, 79, 82, 87, 93, 97, 99}. Now K < (K12 = 87), 
hence the search list is reduced further to { 78, 79, 82}. Comparing K with K}ọ reduces the search 
list to {78} which obviously yields the search to be unsuccessful. In the case of Algorithm 15.5, 
at this step of the search, the recursive call to BINARY SEARCH would have both low = high = 
9, resulting in mid = 9. Comparing K with K,;g results in the call to binary search(L, 9, 8, 


K). Since (low > high) condition is satisfied, the algorithm terminates with the ‘key not 
found’ message. 


Algorithm 15.5: Procedure for binary search 


procedure binary search(L, low, high, K) 
pe iflowshigh]| as a linear ordered sublist of data 
elonen too Loita low IS sem oon (ated Non o a: K 
is the key to be searched in the list. */ 








The McGraw-Hill Companies 


Searching 381 
pee ony ee Inala) then (oam n e icin 0) p 
print Key not found”); 
exit();} 
else 
{ /* key K not found*/ 
l |2 +hi 2 
mid =| ——————— |; 
2 
case 
Kon md]: 4) print (“key found), 


SA y e E f 
return L[mid];} 


is o a e a a S a E 
i fi a A e a E a a ip i) 6 
endcase 


} 


end binary search. p 


Considering the decision tree associated with binary search, it is easy to see that in the worst 
case the number of comparisons needed to search for a key would be determined by the height 
of the decision tree and is therefore given by O(log, n). 





Fibonacci Search 15.6 


The Fibonacci number sequence is given by { 0, 1, 1, 2, 3, 5, 8, 13, 21,...} and is generated by the 
following recurrence relation: 


F,=0 
F,=1 
F,=F4+F 


It is interesting to note that the Fibonacci sequence finds an application in a search technique 
termed Fibonacci search. While binary search selects the median of the sublist as its next element 
for comparison, the Fibonacci search determines the next element of comparison as dictated by 
the Fibonacci number sequence. 

Fibonacci search works only on ordered lists and for convenience of description we assume 
that the number of elements in the list is one less than a Fibonacci number, (i.e.) n = F, — 1. It 
is easy to follow Fibonacci search once the decision tree is traced, which otherwise may look 
mysterious! 


Decision tree for Fibonacci search 


The decision tree for Fibonacci search satisfies the following characteristic: 
If we consider a grandparent, parent and its child nodes and if the difference in index between 
the grandparent and the parent is F, then 
(i) if the parent is a left child node then the difference in index between the parent and its child 
nodes is F,_,, whereas 





The McGraw-Hill Companies 


382 Data Structures and Algorithms 


(ii) if the parent is a right child node then the difference in index between the parent and the 
child nodes is F;,. 5. 
Let us consider an ordered list L={K,,K,,K3,...K,}, K,<K,)<...K, where n = F, — 1. The 
Fibonacci search decision tree for n = 20 where 20 = (F; — 1) is shown in Fig. 15.3. 

The root of the decision tree which is the first element in the list to be compared with key K 
during the search is that key K; whose index 7 is the closest Fibonacci sequence number to n. 
In the case of n = 20, K}; is the root since the closest Fibonacci number to n = 20 is 13. 

If (K < K,3) then the next key to be compared is Kg. If again (K < Kg) then it would be K, and 
so on. Now it is easy to determine the other decision nodes making use of the characteristics 
mentioned above. Since child nodes differ from their parent by the same amount, it is easy to see 
that the right child of K}; should be K,, and that of Kg should be K}; and so on. Consider the 
grandparent-parent combination, Kg and K}; respectively, since K,, is the right child of its parent 
and the difference between Kz and K,, is F, the same between K,, and its two child nodes should 
be F, which is 1. Hence the two child nodes of K} are K,) and K,,. Similarly, considering the 
grandparent and parent combination of K,, and K,, where K,, is the left child of its parent and 
their difference is given by F,, the two child nodes of K,, are given by K,; and K}; (difference 


is F,) respectively. 


Fig. 15.3 Decision tree for Fibonacci search 


Algorithm 15.6 illustrates the procedure for Fibonacci search. Here n, the number of data 
elements is such that 
(i) F; > (n+1) and 

(ii) F, + m = (n +1) for some m 2 0, where F,,, and F, are two consecutive Fibonacci numbers. 


Algorithm 15.6: Procedure for Fibonacci search 


procedure FIBONACCI SEARCH(L, n, K) 

pe Lf[i:n] is a linear ordered (non decreasing) list of data 
elements. n is such that IPP ee (dial) = ALLS) Io eas tile (inl Gai)! 
K is the key to be searched in the list. */ 











The McGraw-Hill Companies 


Searching 383 


Obicarn pene aar ee ERE Oona Ce Enmore Ooo EO itl, 
Fpi? 
Fp-2' 
Py_3h 
(CORE re err oat 
a A IE Lol) then jo = jeri: 
found = false; 
while ( ( p # 0) and ( not found) ) do 
case 
K : print (“key found”); key found*/ 
found = true; 


0) then 12) 
=U par; E 
K : j p = 0 
Q = Qrip 
endcase 
endwhile 


if (found = false) then print (“key not found”); 
end ETBONACCTE EA Cre 





Example 15.6 Let us search for the key K = 434 in the ordered list L = { 2, 4, 8, 9, 17, 36, 44, 
55, 81, 84, 94, 116, 221, 256, 302, 356, 396, 401, 434, 536}. Here n ( n = 20) the number of elements 
is such that (i) Fy > (n+1) and (ii) Fg + m = (n+1) where m=0 and n=20. 

The algorithm for Fibonacci search first obtains the largest Fibonacci number closest to n+1, 
(i.e.), Fg in this case. It compares K = 434 with the data element with index F, (i.e.) L[13] = 221. 
Since K > L[13], the search list is reduced to L[14: 20]= {256, 302, 356, 396, 401, 434, 536}. Now K 
compares itself with L[18] = 401. Since K > L[18] the search list is further reduced to L[19:20] = 
{ 434, 536}. Now K is compared with L[20]=536. Since K < L[20] is true it results in the search list 
{434} which when searched yields the search key. The key is successfully found. 

Following a similar procedure, searching for 66 in the list yields an unsuccessful search. 

The detailed trace of Algorithm 15.6 for the search keys 434 and 66 is shown in Table 15.3. 


Table 15.3 Trace of Algorithm 15.6 for the search keys 434 and 66 


DS ES ™ 


n= 20 
m = 0 since 


Fe+O0=n+1 


K > L[13] = 221 Since K > LIp], 
p = p+m 





K > L[13] 221 
(Contd.) 


The McGraw-Hill Companies 





384 Data Structures and Algorithms 


(Contd.) 
K > L[18] = 401 


K < L[20] = 536 
K = L[19] = 434 Key is found 


K < L[13] = 221 


K> L[8] =55 


K < L[11] = 94 
K < L[10]=84 


K < L[9] = 81 Since (r = 0), p is 
set to 0. 
Key is not found 





An advantage of Fibonacci search over binary search is that while binary search involves 
division which is computationally expensive, during the selection of the next element for key 
comparison, Fibonacci search involves only addition and subtraction. 


Other Search Techniques 15.7 


Tree search 


The tree data structures of AVL trees (Sec. 10.3), m-way trees, B trees and tries (Chapter 11), Red- 
Black trees (Sec. 12.2) etc., are also candidates for the solution of search related problems. The 
inherent search operation that each of these data structures support can be employed for the 
problem of searching. 

The techniques of sequential search, interpolation search , binary search and Fibonacci search 
are primarily employed for files or group of records or data elements that can be accommodated 
within the high speed internal memory of the computer. Hence these techniques are commonly 
referred to as internal searching methods. On the other hand when the file size is too large to be 
accommodated within the memory of the computer one has to take recourse to external storage 
devices such as disks or drums to store the file (see Chapter 17). In such cases when a search 
operation for a key needs to be undertaken, the process involves searching through blocks of 
storage spanning across storage areas. Adopting internal searching methods for these cases 
would be grossly inefficient. The search techniques as emphasized by m-way trees, B trees, tries 
and so on are suitable for such a scenario. Hence these search techniques are referred to as external 
searching methods. 


Graph search 


The graph data structure and its traversal techniques of Breadth first traversal and Depth first 
traversal (Sec. 9.4) can also be employed for search related problems. If the search space is 
represented as a graph and the problem involves searching for a key K which is a node in the 





The McGraw Hill Companies 


Searching 385 


eraph, any of the two traversals may be undertaken on the graph to look for the key. In such a 
case we term the traversal techniques as Breadth first search (see Illustrative Problem 15.6) and 
Depth first search (see Illustrative Problem 15.7). 


Indexed sequential search 


The Indexed sequential search (see Sec. 15.7) is a successful search technique applicable on files 
that are too large to be accommodated in the internal memory of the computer. Also known as 
Indexed Sequential Access Method (ISAM), the search procedure and its variants have been 
successfully applied to Database systems. 

Considering the fact that the search technique is commonly used on data bases or files which 
span several blocks of storage areas, the technique could be deemed as an external searching 
technique. To search for a key one needs to look into the index to obtain the storage block where 
the associated group of records or elements are available. Once the block is retrieved, the 
retrieval of the record represented by the key merely reduces to a sequential search within the 
block of records for the key. 


O Summary 


> The problem of search involves retrieving a key from a list of data elements. In the case 
of a successful retrieval the search is deemed to be successful otherwise it is unsuccessful. 

> The search techniques that work on lists or files that can be accommodated within the 
internal memory of the computer, are called internal searching methods, otherwise they 
are called as external searching methods. 

> Sequential search involves looking for a key in a list L which may or may not be ordered. 
However an ordered sequential search is more efficient than its unordered counterpart. 

> A transpose sequential search sequentially searches for a key in a list but swaps it with the 
predecessor once it is found. This enables efficient search of keys that are repeatedly 
looked for in a list. 

> Interpolation search imitates the kind of search process that one employs while referring 
to a dictionary. The search key is compared with data elements at “calculated positions” 
and the process progresses based on whether the key occurs before or after it. However 
it is essential that the list is ordered. 

> Binary search is a successful and efficient search technique that works on ordered lists. 
The search key is compared with the element at the median of the list. Based on whether 
the key occurs before or after it the search list is reduced and the search process continues 
in a similar fashion in the sublist. 

> Fibonacci search works on ordered lists and employs the Fibonacci number sequence and 
its characteristics to search through the list. 

> Tree data structures viz., AVL trees, tries, m-way search trees, B trees etc., and graphs also 
find applications in search related problems. Indexed sequential search is a popular search 
technique employed in the management of files and databases. 





The McGraw-Hill Companies 





386 Data Structures and Algorithms 


© Illustrative Problems 


Problem 15.1 For the list CHANNELS={ AAXN, ZZEE, CCBC, CCNN, DDDN HHBO, GGOD, 
FFAS, NNDT, SSON, CCAF, NNGE, BBBC, PPRO} trace transpose sequential search for the 
elements in the list SELECT CHANL= {DDDN, NNDT, DDDN, PPRO, DDDN, NNDT}. Obtain the 
number of comparisons made during the search for each of the elements in the list SELECT_CHANL. 


Solution: ‘The trace of transpose sequential search for the search of elements in the list 
SELECT CHANL over the list CHANNELS is presented in the following table: 


Search key List L before search Number of List L after search 
element 
comparisons 
made 
during the 
search 
DDDN TAAXN ZZEE CCEC TAAXN ZZEE CCEC 
CCNN, DDDN, HHBO, DDDN, CCNN, HHBO, 
GGCID) TERAST ININIDIL CEC VES NNT, 
SSON TCCATINNCGE; SSONTCCAFETNNCE 
BBBC, PPRO} BBBC, PPRO} 


NNDT TAAN ZZEE n (BC TAAN ZZEE NBC 
DDDN, CCNN, HHBO, DDN CONN bo! 
CeOD EAS Ni ioe CGODTNND TEES, 
SSON CCAE NNGE SSON CCAF NNCGCE 
BBBC, PPRO} BBBC, PPRO} 

DDDN CAAXN ZZEE OCEO | BEN, TZEE DID Des, 
DD DN CONN THABO CCEC CCNN, THHBEO, 
CCODEINNDI TAS CCODTNNDI TEPAS 
SSON CCAF NNGE, SSON T CCAF NNGE, 
BBBC, PPRO} BBBC, PERO} 


PPRO TAAXN ZZEE TDDDODN TAAN T ZZEE DDDON 
CCEC CCNN T HHBO, CCBC CCNN T HHBO, 
GCGCODTNNDI TETAS CeOD NDT TETAS 
SSON CeAr NNGE SSONTCCAFE NNCE 
BBBC, PPRO} PERO TBBBCI 

DDDN | BN, ZZEE DDDON TAAXNIDDDNEZZEE, 
CCEC CONT ALO, CCBC CCNN THHEO, 
CCODTNNDI TETAS GCODTNNDI TETAS 
SSON CCAF NNCGE, SSONTCCAFE NNCE 
PERO TBBBC PERO TEBBCI 





(Contd.) 





The McGraw-Hill Companies 
Searching 387 


(Contd.) 


(AAXN DDDN ZZEE; TAAN DDDN TZZEE; 
CCBECECCNN THHEO, CCBCTCCNN HHE, 
CCOD a NDT TETAS NN DTICCODPITT TAS 
SSONITCCAF NNCGE, SSON ICCA NNCE 
PPROTBBBCI PPRO BBBC 





Problem 15.2 For the ordered list L = { B, D, F, G, H, L K, L, M, N, O, P, Q, T, U, V, W, X, 
Y, Z} undertake interpolation search (trace of Algorithm 15. 4) for keys H and Y. Make use 
of the respective alphabetical sequence number for the keys, during the computation of the 
interpolation function. 


Solution: The table given below illustrates the trace of the algorithm during the search for keys 
H and Y. 


“_ S SS 


o = — =|575]- 5 


Key found 


Eo 1) —— = =[19.20 |= 19 Y = (L[19]=Y) 


Key found 





Problem 15.3 For the ordered list L and the search keys given in Illustrative Problem 15.2 
trace the steps of binary search during the search process. 


Solution: The binary search processes for the search keys H and Y over the list L are shown in 
Fig. I 15.3. The median of the list (mid) during each step of the search process and the key 
comparisons made are also shown. While H calls for only 2 key comparisons, Y calls for 4 key 
comparisons. 








Search Key: H 

1 2 3 45 6 7 8 9 1011 12 13 14 15 16 17 18 19 20 
PDFGHAHI ALMN OP OTUY, WA YF Z 
| H <N (mid = 10) 


BEB DFGHIKRLM 


H = H (mid = 5) 





The McGraw-Hill Companies 


388 Data Structures and Algorithms 


Key found 





Fig. I 15.3 


Problem 15.4 For the ordered list L shown in Illustrative Problem 15.2 trace the steps of 
binary search for the search key R. 


Solution: ‘The steps of the binary search process for the search key R is shown in Fig. I 15.4. The 
median of the list during each step and the key comparisons made are shown in the figure. The 
search is deemed unsuccessful. 


Search Key: R 


123 45 6 7 8 9 1011 12 13 14 15 1617 18 19 20 
BDFGH Ff REMN OP OTF UL WEY Z 
| dy 

ake 
| 





The McGraw-Hill Companies 


Searching 389 


Key not found RET 
Fig. 115.4 


Problem 15.5 Given the ordered list L = { 2, 4, 8, 9, 17, 36, 44, 55, 65, 100} trace the steps 
of the Fibonacci search algorithm (Algorithm 15.6) for the search key 100. 


Solution: The number of data elements in the list L is n = 10 and n is such that, F, > (n+1) and 
F+ m = (n +1). Here, F, and F; are the two consecutive Fibonacci numbers between which n 
lies and m is roma 3. The trace of the Fibonacci search is shown below: 


Search key K Remarks 
a iat 


n= 10 
=3 
K > L[5] = 17 5+3=8 Since K > Lip, 
p = ptm 
K > L[8] = 55 


K = L[10] = 100 Key found found 









Problem 15.6 For the undirected graph G (Fig. 9.26) shown in Example 9.1 and reproduced 
here for convenience, undertake Breadth first search for the key 9, by refining the Breadth first 


traversal algorithm (Algorithm 9.1). 

4 lPi E 

[7 [+>_8 E 

L laii E 
tio al Br. 
[6 Pis E 


(b) Adjacency list of Graph G 


(a) Graph G 





The McGraw-Hill Companies 


390 Data Structures and Algorithms 


Solution: The Breadth first search procedure is derived from Algorithm 9.1 (procedure BFT(S) 
where S is the start node of the graph) by replacing the procedure parameters as procedure 
BFT(S, K) where K is the search key. Also the statement print (s) is replaced by 

if (s = K) then { print(“key found”); exit();}. 

Unsuccessful searches may be trapped by including the statement 

if EMPTY QUEUE(Q) print(“key not found”); 

soon after the while loop in procedure BFT(S, K). 

The trace of the breadth first search procedure for the search key 9 is shown below: 


Search key K Current vertex Queue Q Status of the visited flag (0/1)of the 
vertices (1-10) of graph G 
1 5 1 10 


1 (start vertex) 


6G = 0O o0 o0 90 © O9 


aS A eA eaeh eaeh OK oA 
eA RO FRO FR DFO OO 
S eo GS Cl COCO OCO OCD © Ceo 
RPO FO me O FPO DO o `O 


2 
0 
2 
0 
2 
0 
2 
0 
2 
0 
2 
1 


POF OF FOF OF e UO 
a ae ee ee eat 


— 





During the expansion of the current vertex, the algorithm sets the visited flag of the vertices 
visited to 1 before they are enqueued into the queue Q. Column 4 of the table illustrates the status 
of the visited flags. Once the current vertex reaches vertex 9, the key is found and the search is 
deemed successful. 


Problem 15.7 For the undirected graph G (Fig. 9.26) shown in Example 9.1, and reproduced 
in Illustrative Problem 15.6 for convenience, undertake Depth first search for the key 9, by refining 
the Depth first traversal algorithm (Algorithm 9.2). Trace the tree of recursive calls. 


Solution: ‘The recursive depth first search procedure can be derived from procedure DFT(S) 
(Algorithm 9.2), where S is the start node of the graph by replacing the procedure parameters as 
procedure DFT(S, K) where K is the search key. Also the statement print (S) is replaced by 
if (s = K) then { print(“key found”); exit( );}. 

The tree of recursive calls for the depth first search of key 9 is shown in Fig. I 15.7 

Each solid rectangular box indicates a call to the procedure DFT(S, K). In the case of depth 
first search, as soon as a vertex is visited it is checked against the search key K. If the search key 
is found, the recursive procedure terminates with the message “key found”. 

The broken line rectangular box indicates a “pending” call to the procedure DFT (S, K). For 
example, during the call DFT(5, 9), vertex 5 has two adjacent unvisited nodes viz., 4 and 9. Since 
depth first search proceeds with the processing of vertex 4, vertex 9 is kept waiting. 





The McGraw-Hill Companies 


Searching 


391 








DFT (1, 9) 
VISITED(1)= 1 


r 7 r 
| DFT (6,9) ! | DFT(7,9) | 
ree an a | ee: eae | 







DFL(3, 9) 
VISITED(5) = 1 


les eee 
DFT (4, 9) | DFT (9,9) |! 
VISITED(4) = 1 


eager ta ta ae | 
DFT (2,9) 
VISITED(2)=1 


DFT (9,9) 
VISITED(9) = 1 
I 


= KEY FOUND 

















Fig. | 15.7 


During the call to the procedure DFT(9, 9), the search key is found in the graph. An 
unsuccessful search is signaled when all the visited flags of the vertices have been set to 1 and 


the search key is nowhere in sight. 


Q) Review Questions 


1. 


Binary search is termed uniform binary search since its decision tree holds the following 
characteristic: 
(a) the indexes of the left and the right child nodes differ by the same amount from that 
of the parent node 
(b) the list get exactly halved in each phase of the search 
(c) the height of the decision tree is logn. 
(d) each parent node of the decision tree has two child nodes. 


. In the context of binary search, state whether true or false: 


(i) the difference in index of all the leaf nodes from their respective parent nodes is 2°. 


(ii) the height of the decision tree is n. 
(a) (i) true (ii) true (b) (i) true (ii) false (c) (i) false (ii) true (d) (i) false (ii) false 


. For a list L={K,,K,, K3,... K33}, Kı < Ky <... K33, undertaking Fibonacci search for a key K 


would yield a decision tree whose root node is given by 
(a) Ky6¢ (b) Ky7 (c) K; (d) K3: 


392 


17. 


18. 


19. 





The McGraw-Hill Companies 


Data Structures and Algorithms 


Which among the following search techniques does not report a worst case time complexity 
of O(n)? 


(a) linear search (b) interpolation search 
(c) transpose sequential search (d) binary search 

Which among the following search techniques works on unordered lists? 
(a) Fibonacci search (b) interpolation search 
(c) transpose sequential search (d) binary search 


What are the advantages of binary search over sequential search? 
When is a transpose sequential search said to be most successful? 
What is the principle behind interpolation search? 

Distinguish between internal searching and external searching. 

What are the characteristics of the decision tree of Fibonacci search? 


. How is breadth first search evolved from breadth first traversal of a graph? 


For the following search list undertake (i) linear ordered search (ii) binary search in the data 
list given. Tabulate the number of comparisons made for each key in the search list. 
Search list: {766, 009, 999, 238} 
Data list: {111 453 231 112 679 238 876 655 766 877 988 009 122 233 344 566} 
For the given data list and search list, tabulate the number of comparisons made when 
(i) a transpose sequential search and (ii) interpolation search is undertaken on the keys 
belonging to the search list. 
Data list: {pin, ink, pen, clip, ribbon, eraser, duster, chalk, pencil, paper, 

stapler, pot, scale, calculator} 
Search list: {pen, clip, paper, pen, calculator, pen} 
Undertake Fibonacci search of the key K = 67 in the list { 11, 89, 34, 15, 90, 67, 88, 01, 36, 
98, 76, 50}. Trace the decision tree for the search. 
Perform (i) Breadth first search and (ii) Depth first search, on the graph given in Fig. R 15.19 
for the key V. 





Fig. R 15.19 





The McGraw-Hill Companies 


Searching 393 


(=) Programming Assignments 


1. Implement binary search and Fibonacci search algorithms ( Algorithms 15.5 and 15.6) on 
an ordered list. For the list L = { 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20} 
undertake search for the elements in the list { 3, 18, 1, 25}. Compare the number of key 
comparisons made during the searches. 

2. Execute an online dictionary ( with a limited list of words) which makes use of interpolation 
search to search through the dictionary given a word. Refine the program to correct any 
misspelled word with the nearest and/or the correct word from the dictionary. 

3. Lis a linear list of data elements. Implement the list as 

(i) a linear open addressed hash table using an appropriate hash function of your choice, 
and 
(ii) an ordered list. 
Search for a list of keys on the representations (i) and (ii) using (a) hashing and (b) binary 
search, respectively. Compare the performance of the two methods over the list L. 

4. Implement a procedure to undertake search for keys L and M in the graph shown in 

Fig. P 15.4 





Fig. P 15.4 


The McGraw-Hill Companies 


CHAPTER 


INTERNAL 
SORTING 





16.1 Introduction 
16.2 Bubble Sort 





Introduction 16.1 16.3 Insertion Sort 
16.4 Selection Sort 
Sorting in English language refers to separating or arranging things 16.5 Merge Sort 


according to different classes. However, in computer science, 
sorting also referred to as ordering deals with arranging elements of 16.6 Shell Sort 
a list or a set or records of a file in the ascending or descending order. 16.7 Quick Sort 
In the case of sorting a list of alphabetical or numerical or 16.8 Heap Sort 
alphanumerical elements, the elements are arranged in their 
ascending or descending order based on their alphabetical or 
numerical sequence number. The sequence is also referred to as a 
collating sequence. In the case of sorting a file of records, one or more 
fields of the records are chosen as the key based on which the 
records are arranged in the ascending or descending order. 
Examples of lists before and after sorting are shown below: 


16.9 Radix Sort 


Unsorted lists Sorted lists 
{ 34, 12, 78, 65, 90, 11, 45} {11, 12, 34, 45, 65, 78, 90} 
{ tea, coffee, cocoa, milk, malt, chocolate} { chocolate, cocoa, coffee, malt, milk, tea} 
{n12x, m34b, n24x, a78h, g56v, m12k, k34d} {a78h, 9560, k34d, m12k, m34b, n12x, n24x} 


Sorting has acquired immense significance in the discipline of computer science. Several data 
structures and algorithms display efficient performance when presented with sorted data sets. 

Many different sorting algorithms have been invented each having its own advantages and 
disadvantages. These algorithms may be classified into families such as sorting by exchange, sorting 
by insertion, sorting by distribution, sorting by selection and so on. However in many cases, it is 
difficult to classify the algorithms as belonging to only a specific family. 

A sorting technique is said to be stable if keys that are equal retain their relative orders 
of occurrence even after sorting. In other words, if K,, K, are two keys such that K, = K,, and 
p(K,) < p(K, ) where p(K;) is the position index of the keys in the unsorted list, then after sorting, 
p'(K,) < p(K, ) where p’(K,) is the index positions of the keys in the sorted list. 

If the list of data or records to be sorted are small enough to be accommodated in the internal 
memory of the computer, then it is referred to as internal sorting. On the other hand if the data 
list or records to be sorted are voluminous and are accommodated in external storage devices 
such as tapes, disks and drums, then the sorting undertaken is referred to as external sorting. 
External sorting methods are quite different from internal sorting methods and are discussed in 
Chapter 17. 





The McGraw-Hill Companies 


Internal Sorting 395 


In this chapter we discuss the internal sorting techniques of Bubble Sort, Insertion Sort, 
Selection sort, Merge Sort, Shell sort, Quick Sort, Heap Sort and Radix Sort. 


Bubble Sort 16.2 





Bubble sort belongs to the family of sorting by exchange or transposition, where during the sorting 
process pairs of elements that are out of order are interchanged until the whole list is ordered. 
Given an unordered list L={K,, K,,K3,...K,,} bubble sort orders the elements in their ascending 
order (i.e.)) L={K,,K,,K3,...K,}, Ki S$ Ko S...K,, 

Given the unordered list L={K,,K,,K;,...K,}, of keys, bubble sort compares pairs of 
elements K; and K; swapping them if K; > K;. At the end of the first pass of comparisons, the 
largest element in the list L moves to the last position in the list. In the next pass, the sublist 
{K,,K,,K3,...K,_,} is considered for sorting. Once again the pair wise comparison of elements 
in the sub list results in the next largest element floating to the last position of the sublist. Thus 
in (n—1) passes where n is the number of elements in the list, the list L is sorted. The sorting is 
called bubble sorting for the reason that, with each pass the next largest element of the list floats 
or “bubbles” to its appropriate position in the sorted list. 

Algorithm 16.1 illustrates the working of bubble sort. 


Algorithm 16.1: Procedure for Bubble sort 


procedure BUBBLE SORT(L, n) 
/* Lfiz:n] is an unordered list of data elements to be 
sorted in the ascending order */ 
for i = 1 to n-1 do /* n-1 passes*/ 
for { = 1 to n=: do 
Ee a a swap em ley lee i ole) a 
Ve Swap pair wise elements*/ 
end /* the next largest element “bubbles” to the last position*/ 


end 


end BUBBLE SORT. 





Example 16.1 Let L = {92, 78, 34, 23, 56, 90, 17, 52, 67, 81, 18} be an unordered list. As the 
first step in the first pass of bubble sort, 92 is compared with 78. Since 92 > 78, the elements are 
swapped yielding the list { 78, 92, 34, 23, 56, 90, 17, 52, 67, 81, 18}. The swapped elements are 
shown in bold. Now the pair 92 and 34 are compared resulting in a swap which yields the list 
{78, 34, 92, 23, 56, 90, 17, 52, 67, 81, 18}. It is easy to see that at the end of pass one, the largest 
element of the list viz., 92 would have moved to the last position in the list. At the end of pass 
one, the list would be { 78, 34, 23, 56, 90, 17, 52, 67, 81, 18, 92}. 

In the second pass the list considered for sorting discounts the last element viz., 92 since 92 
has found its appropriate position in the sorted list. At the end of the second pass, the next largest 
element viz., 90 would have moved to the end of the list. The partially sorted list at this point 
would be { 34, 23, 56, 78, 17, 52, 67, 81, 18, 90, 92}. The elements shown in grey indicate elements 
discounted from the sorting process. In pass 10 the whole list would be completely sorted. 





The McGraw-Hill Companies 


396 Data Structures and Algorithms 


The trace of algorithm BUBBLE_SORT (Algorithm 16.1) over L is shown in Table 16.1. Here 7 
keeps count of the passes and j keeps track of the pair wise element comparisons within a pass. 
The lower (l) and upper (u) bounds of the loop controlled by j in each pass is shown as l..u. 
Elements shown in grey and underlined in the list L at the end of pass i, indicate those discounted 
from the sorting process. 


Table 16.1 Trace of Algorithm 16.1 over the list L= {92, 78, 34, 23, 56, 90, 17, 52, 67, 81, 18} 


t34 23; 50; 79; 17-32-07; 81- 15, 90, 92). 
{23, 34, 56, 17, 52, 67, 78, 18, 81, 90, 92} 
{23, 34, 17, 52, 56, 67, 18, 78, 81,_90, 92} 


{23, 17, 34, 52, 56, 18, 67, 78, 81, 90, 92} 


{17, 23, 34, 52, 18, 56, 67, 78, 81, 90, 92} 
{17, 23, 34, 18, 52, 56, 67, 78, 81, 90, 92} 
{17, 23, 18, 34, 52, 56, 67, 78, 81, 90, 92} 
{17, 18, 23, 34, 52, 56, 67, 78, 81, 90, 92} 
{17, 18, 23, 34, 52, 56, 67, 78, 81, 90, 92} 





Stability and performance analysis 


Bubble sort is a stable sort since equal keys do not undergo swapping, as can be observed in 
Algorithm 16.1, and this contributes to the keys maintaining their relative orders of occurrence 
in the sorted list. 


Example 16.2 Consider the unordered list L = { 7!, 72, 7°, 6}. The repeating keys have been 
distinguished using their orders of occurrence as superscripts. The partially sorted lists at the end 
of each pass of the bubble sort algorithm are shown below: 

Pass 1: { 71, 74, 6 7°} 
Pass 2: { 71, 6, 74, 7°} 
Pass 3: {6, 71, 72, 73} 
Observe how the equal keys 7!, 72, 7° maintain their relative orders of occurrence in the sorted 
list as well, verifying the stability of bubble sort. 
The time complexity of bubble sort in terms of key comparisons is given by O(n’). It is easy 
to see this since the procedure involves two loops with their total frequency count given by O(n’). 


Insertion Sort 16.3 





Insertion sort as the name indicates belongs to the family of sorting by insertion which is based on 
the principle that a new key K is inserted at its appropriate position in an already sorted sub list. 





The McGraw-Hill Companies 


Internal Sorting 397 


Given an unordered list L={K,,K,,K3,...K,}, insertion sort employs the principle of 
constructing the list L={K,, K2, K3,... K;,K, K; Kiq» Ka} Kı SK, <...K; and inserting a key 
K at its appropriate position by comparing it with its sorted sublist of predecessors 
{K,,K,,K3,...K;}, Ki < K, <... K; for every key K (K = K; =2, 3, ..., n) belonging to the unordered 
list L. 

In the first pass of insertion sort, K, is compared with its sorted sublist of predecessors viz., 
K,. K, inserts itself at the appropriate position to obtain the sorted sublist {K,, K,/. In the second 
pass, K; compares itself with its sorted sublist of predecessors viz., {K,, K} to insert itself at its 
appropriate position yielding the sorted list {K}, Kj, K3;} and so on. In the (n-1) pass, K, 
compares itself with its sorted sublist of predecessors { K,, K,,...K,_;/ and having inserted itself 
at the appropriate position yields the final sorted list L = {K}, Ky, K3,...Kj,...Kj,...K,j, K1 SKS 
KS... < K; < ... K, Since each key K finds its appropriate position in the sorted list, such 
a technique is referred to as sinking or sifting technique. 

Algorithm 16.2 illustrates the working of Insertion sort. The for loop in the algorithm keeps 
count of the passes and the while loop implements the comparison of the key key with its sorted 
sublist of predecessors. So long as the preceding element in the sorted sublist is greater than key 
the swapping of the element pair is done. If the preceding element in the sorted sublist is less 
than or equal to key, then key is left at its current position and the current pass terminates. 


Example 16.3 Let L = {16, 36, 4, 22, 100, 1, 54} be an unordered list of elements. The various 
passes of the insertion sort procedure are shown below. The snapshots of the list before and after 
each pass is shown. The key chosen for insertion in each pass is shown in bold and the sorted 
sublist of predecessors against which the key is compared are shown in brackets. 


Pass 1 (Insert 36) { [16] 36, 4, 22, 100, 1, 54} 
After Pass 1 { [16 36] 4, 22, 100, 1, 54} 
Pass 2 (Insert 4) { [16 36] 4, 22, 100, 1, 54} 
After Pass 2 { [4 16 36] 22, 100, 1, 54} 
Pass 3 (Insert 22) { [4 16 36] 22, 100, 1, 54} 
After Pass 3 { [4 16 22 36] 100, 1, 54} 
Pass 4 (Insert 100) { [4 16 22 36] 100, 1, 54} 
After Pass 4 { [4 16 22 36 100] 1, 54} 
Pass 5 (Insert 1) { [4 16 22 36 100] 1, 54} 
After Pass 5 { [1 4 16 22 36 100] 54} 
Pass 6 (Insert 54) { [1 4 16 22 36 100] 54} 
After Pass 6 { [1 4 16 22 36 54 100]} 


Algorithm 16.2: Procedure for Insertion sort 


procedure INSERTION SORT(L, n) 
/* Lfizn] is an unordered list of data elements to be sorted 


in the ascending order */ 


for i = 2 to n do /* n-1 passes*/ 
key = L[il; /* key is the key to be inserted 
and position lesan OC Elon mln Ele 

unordered (ist -/ 





The McGraw-Hill Companies 


398 Data Structures and Algorithms 


POSICION =, 
/* compare key with its sorted 
sublist of predecessors for insertion 
ENE Che appropria e “joel Con / 
while (position > 1) and (L[position-1]> key) do 
Lipeositioni P= 9 iy position | 
Dose ion —sOOs MelOnas— l; 
L{[position] = key; 


end 


end INSERTION SORT. 





Stability and performance analysis 


Insertion sort is a stable sort. It is evident from the algorithm that the insertion of key K at its 
appropriate position in the sorted sublist affects the position index of the elements in the sublist 
so long as the elements in the sorted sublist are greater than K. When the elements are less than 
are equal to the key K, there is no displacement of elements and this contributes to retaining the 
original order of keys which are equal, in the sorted sublists. 


Example 16.4 Consider the list L = { 31, 1, 2t, 32, 3°, 2} where the repeated keys have been 
superscripted with numbers indicative of their relative orders of occurrence. The keys for insertion 
are shown in bold and the sorted sublists are bracketed. 

The passes of the insertion sort are shown below: 


Pass 1 (Insert 1) { [BH] 1, 24, 34, 33, 24} 

After Pass 1 { [1 34] 24 34, 33, 24} 

Pass 2 (Insert 2) { [1 31] 21, 32, 33, 22 
After Pass 2 { [1 2! 31] 32, 33, 22 } 
Pass 3 (Insert 3) { {1 2! 31] 32, 33 22} 
After Pass 3 [1 2! 3! 34] 39 24} 
Pass 4 (Insert 3) {{1 2! 31 32] 33 22 } 
After Pass 4 {{1 21 31 32 33] 22 } 
Pass 5 (Insert 2) {{1 2! 31 34 39] 27} 
After Pass 5 {1 21 22 31 32 337) 


The stability of insertion sort can be easily verified on this example. Observe how keys which 
are equal maintain their original relative orders of occurrence in the sorted list. 

The worst case performance of insertion sort occurs when the elements in the list are already 
sorted in their descending order. It is easy to see that in such a case every key that is to be inserted 
has to move to the front of the list and therefore undertakes the maximum number of 
comparisons. Thus if the list L={K,,K,,K3,...K, }, K, 2 K, 2...K,, is to be insertion sorted then 
the number of comparisons for the insertion of key K; would be (i-1) since K; would swap 
positions with each of the (i-1) keys occurring before it until it moves to position 1. Therefore the 
total number of comparisons for inserting each of the keys is given by 





The McGraw Hill Companies 


Internal Sorting 399 


(n-1)(n) 
— s 
The best case complexity of insertion sort arises when the list is already sorted in the ascending 
order. In such a case the complexity in terms of comparisons is given by O(n). 
The average case performance of insertion sort reports O(n?) complexity. 


14+24+3+4+..(n-1)= O(n?) 





Selection Sort 16.4 


Selection sort is built on the principle of repeated selection of elements satisfying a specific 

criterion to aid the sorting process. 

The steps involved in the sorting process are listed below: 

(i) Given an unordered list L={K,, Ky, K3,... K;...K,}, select the minimum key K 

(ii) Swap K with the element in the first position of the list L, viz., K, By doing so the minimum 
element of the list has secured its rightful position of number one in the sorted list. This step 
is termed pass 1. 

(iii) Exclude the first element and select the minimum element K, from amongst the remaining 
elements of the list L. Swap K with the element in the second position of the list viz., K, 
This is termed pass 2. 

(iv) Exclude the first two elements which have occupied their rightful positions in the sorted list 
L. Repeat the process of selecting the next minimum element and swapping it with the 
appropriate element, until the entire list L gets sorted in the ascending order. The entire 
sorting gets done in (n—1) passes. 

Selection sort can also undertake sorting in the descending order by selecting the maximum 

element instead of the minimum element and swapping it with the element in the last position of 

the list L. 

Algorithm 16.3 illustrates the working of selection sort. The procedure FIND MINIMUM (L, i, n) 
selects the minimum element from the array L[i:n] and returns the position index of the minimum 
element to procedure SELECTION SORT. The for loop in the SELECTION SORT procedure 
represents the (n—1) passes needed to sort the array L[1:n] in the ascending order. Function swap 
swaps the elements input to it. 


Algorithm 16.3: Procedure for Selection sort 


procedure SELECTION SORT(L, n) 
/* Lfiz:n] is an unordered list of data elements to be 
sorted in the ascending order */ 


for o: = l tO wn | do o oaae 

Mnman Nde EMANU Tye 3 ral) /* find minimum element 
Of Che Jase hia n and store ene position “index or 
Ene Elleimeme ii a a / 

swap(L[i], L[minimum_index]); 

end 


end SELECTION SORT 








The McGraw-Hill Companies 


400 Data Structures and Algorithms 


procedure FIND MINIMUM(L, i, n) 
/* the position index of the minimum element in the array 
Blt = nal ais returned", 
Minen o 8 
for =n o an “do 
oe FIG) a a a. 
end 
return (min _indx) 





end FIND MINIMUM 


Example 16.5 Let L = {71, 17, 86, 100, 54, 27} be an unordered list of elements. Each pass 
of selection sort is traced below. The minimum element is shown in bold and the arrows indicate 
the swap of the elements concerned. The elements in gray indicate their exclusion in the passes 
concerned. 


Pass List L (During Pass) List L (After Pass) 
1 {71, 17, 86, 100, 54, 27} {17, 71, 86, 100, 54, 27} 
KA 
2 {17, 71, 86, 100, 54, 27} {17, 27, 86, 100, 54, 71} 
3 {17, 27, 86, 100, 54, 71} {17, 27, 54, 100, 86, 71} 
a. A 
4 {17, 27, 54, 100, 86, 71} {17, 27, 54, 71, 86, 100} 
Ba at 
5 {17, 27, 54, 71, 86, 100} {17, 27, 54, 71, 86, 100} (Sorted list) 
KA 


Stability and performance analysis 


Selection sort is not stable. Example 16.6 illustrates a case. The computationally expensive portion 
of selection sort occurs when the minimum element has to be selected in each pass. The time 
complexity of FIND MINIMUM procedure is O(n). The time complexity of SELECTION SORT 
procedure is therefore O(n’). 


Example 16.6 Consider the list L={6!, 62, 2}. The repeating keys have been superscripted 
with numbers indicative of their relative orders of occurrence. A trace of the selection sort procedure 
is shown below. The minimum element is shown in bold and the swapping is indicated by the 
curved arrow. The elements excluded from the pass are shown in gray. 


Pass List L (During Pass) List L (After Pass) 

1 (E ©, 2] (2 G e] 

2 { 2, 64 6l} {2, 62, 6! } (Sorted list) 
KA 


The selection sort on the given list L is therefore not stable. 


The McGraw Hill Companies 


Internal Sorting 401 





Merge Sort 16.5 


Merging or collating is a process by which two ordered lists of elements are combined or merged 
into a single ordered list. Merge sort makes use of the principle of merge to sort an unordered list 
of elements and hence the name. In fact a variety of sorting algorithms belonging to the family 
of sorting by merge exist. Some of the well known external sorting algorithms belong to this class. 


Two-way Merging 


Two-way merging deals with the merging of two ordered lists. 
Let Ly = (Ay, Az, A; dy} M Say S.a S A; Say and Ly = {b1, ba, b; bm } bi Sba S... <b; S...D,, be 
two ordered lists. Merging combines the two lists into a single list L by making use of the 
following cases of comparison between the keys a; and b, belonging to L, and L, respectively: 
Al. If(a;< b; ) then drop 4; into the list L 
A2. If (a; > p: ) then drop b; into the list L 
A3. If (4; = b, ) then drop both a; and b, into the list L 


In the case of A1, once a; is dropped into the list L the next comparison of b; proceeds with 
a;,1- In the case of A2, once b; is dropped into the list L the next comparison of a; proceeds with 
b;,;- In the case of A3 the next comparison proceeds with a;,; and b;,;. At the end of merge, list 
L contains (n+m) ordered elements. 

The series of comparisons between pairs of elements from the lists L4 and L, and the dropping 
of the relatively smaller elements into the list L proceeds until one of the following cases happens: 

B1. L, gets exhausted earlier to that of L,. In such a case, the remaining elements in list L, 

are dropped into the list L in the order of their occurrence in L, and the merge is done. 

B2. L, gets exhausted earlier to that of L,. In such a case the remaining elements in list L} are 

dropped into the list L in the order of their occurrence in L,; and the merge is done. 

B3. Both L} and L, are exhausted, in which case merge is done. 


Example 16.7 Consider the two ordered lists L} = { 4, 6, 7, 8} and L, = { 3, 5, 6}. Let us merge 
the two lists to get the ordered list L. L contains 7 elements in all. Figure 16.1 illustrates the 
snapshots of the merge process. Observe how when the elements 6, 6 are compared both the 
elements drop into the list L. Also note how list L, gets exhausted earlier to L, resulting in all the 
remaining elements of list L} getting flushed into list L. 

Algorithm 16.4 illustrates the procedure for merge. Here the two ordered lists to be merged 
are given as (X1, Xp, ...X,) and (X; X44, ---X,,) to enable reuse of the algorithm for merge sort 
to be discussed subsequently. The input parameters to procedure MERGE is givenas (x, first, 
mid, last) where first is the starting index of the first list, mid the index related to the end/ 
beginning of the first and second list respectively and last the ending index of the second list. 
The call to merge the two lists, (x4, xX», ...X,) and (X14, X49, ---X,,) Would be MERGE (x, 1, t, n). 
While the first while loop in the procedure performs the pair wise comparison of elements in the 
two lists as discussed in cases A1-A3, the second while loop takes care of the case B1 and the 
third loop that of the case B2. Case B3 is inherently taken care of in the first while loop. 





The McGraw-Hill Companies 


402 Data Structures and Algorithms 


Initialization 


{|| 4678 
L 


Compare 4,3: 


L 4678 
3JL 


Compare 4,5: 
- 
4 
3J L 


7 8 


~ 


U 
Un 
O 
~ 
=) 


hs 


ON 


JE 


~ 


i 


Nn 
i 
~ 
N 


SS 


lon 
Ea 
O^ 


f 
~ 
N 


~ 


(LRU 
N 
Q0 
~ 
N 


L 


Lə is exhausted. Drop elements of L4 into L 


[Uo RUADR IC 
NM — 


L 
Fig. 16.1 Two-way merge 


Performance analysis 


The first while loop in Algorithm 16.4 executes at most (last-first+1) times and plays a 
significant role in the time complexity of the algorithm. The rest of the while loops only move 
the elements of the unexhausted lists into the list L. The complexity of the first while loop and 
hence the algorithm is given by O (last-first+1). In the case of merging two lists (x4, X>, ...x;), 
(Xir Xiz ---X,) Where the number of elements in the two lists sums to n, the time complexity 
of MERGE is given by O(n). 





The McGraw-Hill Companies 


Internal Sorting 403 


k-way merging 


The two-way merge principle could be extended to k ordered lists in which case it is termed as 
k-way merging. Here k ordered lists 


Ly = {411 412, -li My, b Mg S lp S.. SG; Sy, 


L, = {a}, lny. A» ; + An, bj M1 < > Su. ; S.A n, 
Ly. = { Ak, lko, ee Agi e lkn, \ Any < lg o < es Ay i < e Akn, 


each comprising n, 1,,..n, number of elements are merged into a single ordered list L 
comprising (nı + n, +.. n; ) number of elements. At every stage of comparison, k keys a;, one 
from each list, are compared before the smallest of the keys are dropped into the list L. Cases 
A1- A3 and B1 - B3 discussed in Sec. 16.5 with regard to two-way merge, hold good in this case 
as well but as extended to k lists. Illustrative Problem 16.3 discusses an example k-way merge. 


Non recursive merge sort procedure 


Given a list L ={K;, K2, K3,...K,,} of unordered elements, merge sort sorts the list making use of 
procedure MERGE repeatedly over several passes. 

The non recursive version of merge sort merely treats the list L of n elements as n independent 
ordered lists of one element each. In pass one, the n singleton lists are pair wise merged. At the 
end of pass 1, the merged lists would have a size of 2 elements each. In pass 2, the lists of size 
2 are pair wise merged to obtain ordered lists of size 4 and so on. In the i pass the lists of size 
2(-1) are merged to obtain ordered lists of size 2. 

During the passes, if any of the lists are unable to find a pair for their respective merge 
operation, then they are simply carried forward to the next pass. 


Example 16.8 Consider the list L = {12, 56, 1, 34, 89, 78, 43, 10} to be merge sorted using its 
non recursive formulation. Figure 16.2 illustrates the pair wise merging undertaken in each of the 
passes. The sublists in each pass are shown in brackets. Observe how pass 1 treats the list L as 
8 ordered sublists of one element each and at the end of merge sort, pass 3 obtains a single list 
of size 8 which is the final sorted list. 


LIST L: 
[12] [56] [1] [89] 10] 
passi NZ S SZ N 
[12, 56] [1,34] [78,89] [10,43] 
mo A OZ W 
[1, 12, 34, 56] [10, 43, 78, 89] 


PASS 3 y p 


[1, 10, 12, 34, 43, 56, 78, 89] 
Fig. 16.2 Non recursive merge sort of list L = { 12, 56, 1, 34, 89, 78, 43, 10} (Example 16.8) 





The McGraw-Hill Companies 


404 Data Structures and Algorithms 


Performance analysis Merge sort proceeds by running several passes over the list that is to 
be sorted. In pass 1 sublists of size 1 are merged, in pass 2 sublists of size 2 are merged and in 
the i pass sublists of size 2-1) are merged. Thus one could expect a total of passes over the list. 
With the merge operation commanding O(n) time complexity, each pass of merge sort takes O(n) 
time. The time complexity of merge sort therefore turns out to be O(n.log, n). 


Stability Merge sort is a stable sort since the original relative orders of occurrence of 
repeating keys are maintained in the sorted list. Illustrative Problem 16.4 demonstrates the 
stability of the sort over a list. 


Recursive merge sort procedure 


The recursive merge sort procedure is built on the design principle of Divide and Conquer. Here, 
the original unordered list of elements is recursively divided roughly into two sublists until the 
sublists are small enough where a merge operation is done before they are combined to yield the 
final sorted list. 

Algorithm 16.5 illustrates the recursive merge sort procedure. The procedure makes use of 
MERGE (Algorithm 16.4) for its merging operation. 


Example 16.9 Let us merge sort the list L = { 12, 56, 1, 34, 89, 78, 43, 10} using Algorithm 16.5. 
The tree of recursive calls demonstrating the working of the procedure on the list L is shown in 
Fig. 16.3. The list is recursively divided into two sublists to be merge sorted before they are 
merged to obtain the final sorted list. Each rectangular node of the tree indicates a procedure call 
tO MERGE SORT with the parameters to the call inscribed inside the box. Beneath the parameter 
list is shown the output sublist obtained at the end of the execution of the procedure call. 


15 
(L, 1, 8) 
[1, 10, 12, 34, 43, 56, 78, 89] 





14 
(L, 5, 8) 
[10, 43, 78, 89] 


10 13 
(L, 5, 6) T8) 
[78, 89] [10, 43] 


n 
(L, 1, 4) 
[1, 12, 34, 56] 













Fig. 16.3 Tree of recursive calls illustrating recursive merge sort of list L = { 12, 56, 1, 34, 89, 
78, 43, 10} (Example 16.9) 





The McGraw-Hill Companies 


Internal Sorting 405 


The invocation of MERGE SORT (L, 1, 8) generates two other calls viz., MERGE SORT (L, 
1, 4) and MERGE SORT (L, 5, 8) and so on leading to the construction of the tree. Down the 
tree, the procedure calls MERGE SORT (L, 1, 1) and MERGE SORT (L, 2, 2) in that order, 
are the first to terminate releasing the lists [12] and [56] respectively. This triggers the MERGE (L, 
1, 1, 2) procedure yielding the sublist [12, 56] as the output of the procedure call MERGE SORT 
(L, 1, 2). Observe [12, 56] inscribed in the rectangular box 3 which corresponds to the 
procedure call MERGE SORT (L, 1, 2). Proceeding in a similar fashion, it is easy to build the 
tree and obtain the sorted sublists resulting out of each of the calls. The number marked over each 
rectangular node indicates the order of execution of the recursive procedure calls to MERGE SORT. 

With MERGE SORT (L, 1, 4) yielding the sorted sublist [1, 12, 34, 56] and MERGE SORT (L, 
5, 8) yielding [10, 43, 78, 89], the execution of the call MERGE (L, 1, 4, 8) terminates the 
call to MERGE SORT (L, 1, 8) resulting in the sorted list [1, 10, 12, 34, 43, 56, 78, 89]. 


Performance analysis Recursive merge sort follows a Divide and Conquer principle of 
algorithm design. Let T(n) be the time complexity of MERGE SORT where n is the size of the list. 
The recurrence relation for the time complexity of the algorithm is given by 


T(n) -2.7(8) +O(n), n>2 
=d 


Here (3) is the time complexity for each of the two recursive calls to MERGE SORT over a list 


of size n/2 and d is a constant. O(n) is the time complexity of merge. Framing the recurrence 
relation as 


T(n) -2.7(2)+e.n, n>2 


=d 
where cis a constant and solving the relation yields the time complexity T(n) = O(n.log, n) ( see 
Illustrative Problem 16.5). 


Shell Sort 16.6 


Insertion sort (Sec. 16.3) moves items only one position at a time and therefore reports a time 
complexity of O(n?) on an average. Shell sort is a substantial improvement over insertion sort in 
the sense that elements move in long strides rather than single steps, thereby yielding a 
comparatively short sub file or a comparatively well ordered sub file which quickens the sorting 
process. 

The shell sort procedure was proposed by Donald L Shell in 1959. The general idea behind the 
method is to choose an increment h, and divide a list of unordered keys L = {K4, K3, K3,... K; ard] 
into sub lists of keys that are h, units apart. Each of the sub lists are individually sorted 
(preferably insertion sorted) and gathered to form a list. This is known as a pass. Now we repeat 
the pass for any sequence of increments {h,_,,/,_5,...M,h,,ho} where họ must equal 1. The 
increments are kept in the diminishing order and therefore shell sort is also referred to as 
diminishing increment sort. 





The McGraw-Hill Companies 


406 Data Structures and Algorithms 


Algorithm 16.4: Procedure for Merge 


procedure MERGE (x, first, mid, last ) 
/ Xi GLESE? Mma) and 9 imid? ls tast)] are ordered Jists of 
data elements to be merged into a single ordered list 
S Gi a a Geo ehsuey i Bas’ 


fies bala be—s) shi ies 

lastl = mid; 

frrot2 = mid yale 

last2 = last; /* set the beginning and the ending indexes of the two 
lists into the appropriate variables*/ 

iI = first; /* i is the index variable for the temporary output list 


temp*/ 


/* begin pair wise comparisons of elements from the two 
I Se 7 


Whtle = (firsi l= lasci) and (frrsc2 < lastZ) do 


case 
x [firsti] < a eines te |e: 1 templ l=- >Er, 
cirst = ruresti + 1; 
eye — mro $ le 
} 
eG E e o oa Pee KI eee SEAN: M templ ls >lfirst I; 
ELEST2 = FLEST? + 1; 
moa m t le 
} 
Someries te = i eis ie 7) |n8 { templils xIfirstI]; 
Cemo A l ERr ee le 
as be = firesc + 1; 
ELESE2 = FLEST? + 13 
Ioa L + 25 
} 
end / "end "ease*/7 
end /* end while*/ 


/* the first list gets exhausted*/ 
whale (first2 SS. clastZ) do 


temp[iJ= x[first2]; 
eS Oe eS ee + 15 
o= r + le 

end 


/* the second list gets exhausted / 
while (firstl < lastl) do 


templi l- (ras cl: 
cirst = rircesti + 1s 
A — eager? sles 
end 
a COP A E temp Co list x / 
for jy — first ‘to Last do 
LG) | reenter 3 


end 
end MERGE. 


The McGraw-Hill Companies 


Internal Sorting 407 


Algorithm 16.5: Procedure for Recursive Merge Sort 


procedure MERGE SORT(a, first, last ) 
Jo Eee SL eISic | Sale Tene. Vilaoimolaieeie! ILS oie ethenneaes - eo, Joe 
merge sorted. The call to the procedure to sort the 
isteig alli sia) ywomulcl Joe MINIs EOI (ely, sl al) 7 


(first < last) then 


2 


(first+last) 
mid =| ————————- | ; COL Waele the list into two  Ssololliisiee 7 


MERCES Or (al; 5  akiesie, — imlael) merge sort the sublist alfirst,;mid]'/ 
ERCE SOM n a E merge sort the sublist a[midtl, last]*/ 


MERCEGI nipa Se tie); we ears a) ay merge the two sublists a[first,mid] and 
am T = EYE) 7 
} 


end MERGE SORT. 





Example 16.10 illustrates shell sort on the given list L for an increment sequence { 8, 4, 2, 1}. 


Example 16.10 Trace the shell sort procedure on the unordered list L of keys given by 
L = {24, 37, 46, 11, 85, 47, 33, 66, 22, 84, 95, 55, 14, 09, 76, 35} for an increment sequence 
{h, ha, hi, ho} = { 8, 4, 2, 1}. 

The steps traced are shown in Fig. 16.4. Pass 1 for an increment 8, divides the unordered list 
L into 8 sublists each comprising 2 keys, that are 8 units apart. After each of the sublists have 
been individually insertion sorted, they are gathered together for the next pass. 

In Pass 2, for an increment 4, the list gets divided into 4 groups, each comprising elements 
which are 4 units apart in the list L. The individual sub lists are again insertion sorted and 
gathered together for the next pass and so on, until in Pass 4 the entire list gets sorted for an 
increment 1. 

The shell sort, in fact could work for any sequence of increments so long as hg equals 1. Several 
empirical results and theoretical investigations have been undertaken regarding the conditions to 
be followed by the sequence of increments. Example 16.11 illustrates shell sort for the same list 
L used in Example 16.10 but for a different sequence of increments, viz., {7, 5, 3, 1}. 


Example 16.11 Trace the shell sort procedure on the unordered list L of keys given by 
L = {24, 37, 46, 11, 85, 47, 33, 66, 22, 84, 95, 55, 14, 09, 76, 35} for an increment sequence 
{h3,h,h,, ho} = { 7, 5, 3, 1}. 

Figure 16.5 illustrates the steps involved in the sorting process. In Pass 1, the increment of 7 
divides the sublist L into 7 groups of varying number of elements. The sub lists are insertion 
sorted and gathered for the next pass. In Pass 2, for an increment of 5, the list L gets divided into 
5 groups of varying number of elements. As before they are insertion sorted and so on until in 
Pass 4 the entire list gets sorted for an increment of 1. 

Algorithm 16.6 describes the skeletal shell sort procedure. The array L[1:n] represents the 
unordered list of keys, L={K,,K,,Kz, +e Kj. Ky} . H is the sequence of increments 
(hy, hii Myo, Mo hy, Ho} . 





The McGraw-Hill Companies 


408 Data Structures and Algorithms 


Unordered list L: 
Ki, K, K; Ky Ks Ke Kz Kg Ky Kio Ki Kio Kio Ku Kis Kie 
24 37 46 11 85 47 33 66 22 84 = 95 14 09 76 35 
Pass 1 (increment h; = 8) 
(Ki K) (Ky Kyo) K3 Ky) (Ky Kid) (K; Ka) (Ke Kip (Ky Kis) (Kg Kio) 


(24 22) (37, 84) (46, 95) (11, 55) (85, 14) (47, 09) (33, 76) (66, 35) 
After insertion sort: 


(22 24) (37, 84) (46, 95) (11, 55) (14, 85) (09, 47) (33, 76) (35, 66) 
List L after Pass 1: 

1 Cie Ce Ce. oe Came Ce © mae Cm St eS eS a eG a 

22 37 46 11 14 09 33 35 24 84 95 55 85 47 76 66 

Pass 2 (increment h, = 4) 

(Ki K; Ky Kis) (K, Ke Kio Kia) (K; Kı Ky, Kis) (Ky, Ks Kn Kio) 

(22 14 24 85) (37 09 84 47) (46 33 95 76) (11 35 55 66) 
After insertion sort: 

(14 22 24 85) (09 37 47 84) (33 46 76 95) (11 35 55 66) 
List L after Pass 2: 

Cie Ce Ceo ae Coe Ce © ee eC Sr a eS A A eG mmr G7 

14 09 33 1l 37 46 35 24 47 76 55 85 84 95 66 

Pass 3 (increment A; = 2) 

(Ki K, Ks K, Ki, Ki; Kis) (K> Ky Ke Kg Kio Kiz Kya Kio 

(14 33 22 46 76 85 95) (09 11 37 35 47 55 84 66) 
After insertion sort: 

(14 22 24 33 85 95) (09 11 35 37 47 55 66 84) 
List L after Pass 3: 

K, K, K, Ky K, Kg A Ky A Ky3 A AAA 

14 09 22 il 33 37 46 47 76 55 85 66 95 84 


Pass 4 (increment ho =1) 
(Ki K K, Ky K, Kg Ky Kio Ku Ki Ky3 Kia Kis Kio 
(14 09 22 Il 33 37 46 471 76 55 85 66 95 84) 


After insertion sort: 
(09 11 14 22 37 46 47 55 66 76 84 85 95) 
Sorted List L 
Ki K K, K, K; Ke Kg Ky Kio Ki Ki Kiz Kia Kis Ki 
09 11 14 22 24 33 37 46 47 55 66 76 84 85 95 





Fig. 16.4 Shell sorting of L = {24, 37, 46, 11, 85, 47, 33, 66, 22, 84, 95, 55, 14, 09, 76, 35} 
for the increment sequence {8, 4, 2, 1} 





The McGraw-Hill Companies 


Internal Sorting 409 


Unordered list L: 

Ki K, K, Ky Ks Ke Kı Kg Ko Kio Ky Kyi Kis Kia Kis Kio 

24 37 46 11 85 47 33 66 22 84 95 55 14 09 76 35 

Pass 1 (increment h; = 7) 

(Ki Kg Kis) (K, Ky Kig) (K3 Kyo) (Ky Kn) (K; Ki?) (Ks Kj3) (A, Kia) 
(24 66 76) (37; 22) 35) (46 84) (11 95) (85 55) (47 14) (33 09) 
After insertion sort: 
(24 66 76) (22 35 37) (46 84) (11 95) (55 85) (14 47) (09 33) 
List L after Pass 1: 

Ki K, K, Ky Ks Ke Kı Kg Ko Kio Ky Ku Kis Kia Kis Kie 

24 22 46 11 55 14 09 66 35 84 95 85 47 33 76 37 

Pass 2 (increment h, = 5) 

(Ki Ke Ki, Kio (K, K, Ki) (K; Kg Kiz) (Ky Ky Kia (Ks Kio Kis) 
(24 14 95 37) (22 09 85) (46 66 47) (11 35 33) (55 84 76) 
After insertion sort: 
(14 24 37 95) (09 22 85) (46 47 66) (11 33 35) (55 76 84) 
List L after Pass 3: 

Ki K, K; Ky Ks Ke Kı Kg Ky Kio Ky Ki Kis Kia Kis Kio 

14 09 46 11 55 24 22 47 33 76 37 85 66 35 84 95 

Pass 3 (increment h, = 3) 
(Ki Ky Kı Kio Ki Kio) (K, K; Kg Ku Kip (K; Ke Ky Ki Kis) 
(14 11 22 76 66 95) (09 55 47 37 35) (46 24 33 85 84) 
After insertion sort: 
(11 14 22 66 76 95) (09 35 37 47 55) (24 33 46 84 85) 
List L after Pass 2 
Ki K, K; Ky Ks Ke Kı Kg Ky Kio Ky, Ku Kis Kia Kis Kie 
11 09 24 14 35 33 22 37 46 66 47 84 76 55 85 95 


Pass 4 (increment hg =1) 
(Ki K K; Ky Ks Ke Kz Kg Ky Kio Ku Ku Kis Kia Kis Kio) 
(11 09 24 14 35 33 22 37 46 66 47 84 76 55 85 95) 


After insertion sort: 

(09 11 14 22 24 33 35 37 46 47 55 66 76 84 85 95) 

Sorted List L 
K, K K, Ky K; Ke Kı Kg Ky Kio Ku Ki Kiz Kia Kis Kie 
09 11 14 22 24 33 35 37 46 47 55 66 76 84 85 95 





Fig. 16.5 Shell sorting of L = {24, 37, 46, 11, 85, 47, 33, 66, 22, 84, 95, 55, 14, 09, 76, 35} 
for the increment sequence { 7, 5, 3, 1}. 





The McGraw-Hill Companies 


410 Data Structures and Algorithms 


Algorithm 16.6: Procedure for Shell Sort 


procedure SHELL SORT(L, n, H ) 
i* Lilien) is the unordered list of keys to be shell sorted. 
CE IG Gy, K3,..K;...K,}) Healey ey cell nee) eS me Se ciuicmee 
ie sien eon 
for each h, € H do 
lasertion SOLE tche Sublist of elements in L[1:n] 
which are Me unite Copri Si blela aL 
LII e L a Ail, fOr ae Noh 


end 


print (L) 
end SHELL SORT. p 
Analysis of shell sort 


The analysis of shell sort is dependent on a given choice of increments. Since there is no best 
possible sequence of increments that has been formulated, especially for large values of n (the size 
of the list L), the time complexity of shell sort is not completely resolved. In fact it has led to some 
interesting mathematical problems! An interested reader is referred to Sec. 5.2.1 of Donald 
Knuth’s book (Art of Computer Programming vol. III : Sorting and Searching, Second edition, Pearson 
Education, 2002) for discussions on these results. 


Quick Sort 16.7 





Quick sort procedure formulated by C.A.R. Hoare belongs to the family of sorting by exchange or 
transposition where elements that are out of order are exchanged amongst themselves to obtain the 
sorted list. 

The procedure works on the principle of partitioning the unordered list into two sublists at 
every stage of the sorting process based on what is called a pivot element. The two sublists occur 
to the left and right of the pivot element. The pivot element determines its appropriate position 
in the sorted list and is therefore freed of its participation in the subsequent stages of the sorting 
process. Again each of the sublists are partitioned against their respective pivot elements until no 
more partitioning can be called for. At this stage all the elements would have determined their 
appropriate positions in the sorted list and quick sort is done. 


Partitioning 


Consider an unordered list L ={K;, K,,K3,...K,}. How does partitioning occur? Let us choose 
K, to be the pivot element. Now K, compares itself with each of the keys on a left to right 
encounter looking for the first key K, K; 2 K. Again K compares itself with each of the keys on 
a right to left encounter looking for the first key K, K;< K. If K; and K; are such that i < j, then 
K; and K, are exchanged. Figure 16.6(a) illustrates the process of exchange. 

Now K moves ahead from position index 7 on a left to right encounter looking for a key K,, 
K, 2 K. Again as before, K moves on a right to left encounter beginning from position index j 
looking for a key K, K,< K. As before if s < t, then K, and K, are exchanged and the process repeats 





The McGraw-Hill Companies 


Internal Sorting 411 


L 


Pivot element: 


Spot K; and K;: 











Exchange elements we Te 

K; and K; since 

i<j 
(a) Exchange K; and K; (i <j) where K; is the first 
occuring element from the left with K; 2 K and K; 
is the first occuring element from the right with K; < K 

Li 
Pivot element: 
Spot K, and K;: 


Exchange elements 
K,and K; since 








(s<t) 
L ia 
> <— 
Pivot element: 
Spot K, and K, 
(s >t) 


Exchange pivot 
element K, and K, 





Partition L 
into two sublists 





elements less than elements greater than 
<— 
or equal to K] or equal to K] 


(c) Exchange K; and K, (s> t) 


Fig. 16.6 Partitioning in Quick Sort 


(Fig. 16.6(b)). If s > t, then K exchanges itself with K, -the key which is smaller of K, and K,. At 
this stage a partition is said to occur. The pivot element K which has now exchanged position with 
K, is the median around which the list partitions itself or splits itself into two. Figure 16.6(c) 
illustrates partition. Now what do we observe about the partitioned sublists and the pivot 
element? 
(i) The sublist occurring to the left of the pivot element K (now at position t) has all its elements 
less than or equal to K and the sublist occurring to the right of the pivot element K has all 
its elements greater than or equal to K. 





The McGraw-Hill Companies 


412 Data Structures and Algorithms 


(ii) The pivot element has settled down to its appropriate position which would turn out to 
be its rank in the sorted list. 


Example 16.12 Let L = {34, 26, 1, 45, 18, 78, 12, 89, 27} be an unordered list of elements. We 
now demonstrate the process of partitioning on the above list. Let us choose 34 as the pivot 
element. Figure 16.7 illustrates the snap shots of partitioning the list. Here 34 moves left to right 
looking for the first element that is greater than or equal to it and spots 45. Again moving from 
right to left looking for the first element less than or equal to 34, it spots 27. Since the position 
index of 45 is less than that of 27 (arrows face each other), they are exchanged. 

Proceeding from the points where the moves were last stopped, 34 encounters 78 during its 
left to right move and encounters 12 during its right to left move. As before the arrows face each 
other resulting in an exchange of 78 and 12. In the next lap of the move we notice the elements 
78 and 12 are spotted again but this time note that the arrows have crossed each other. This 
implies that the position index of 78 is greater than that of 12 calling for a partition. 34 exchanges 
position with 12 and the list is partitioned into two as shown. 

It may be seen that all elements less than or equal to 34 have accumulated to its left and those 
ereater than or equal to 34 have accumulated to its right. Again the pivot element 34 has settled 
down at position index 6 which is its rank in the sorted list. 


Es 34 26 1 45 18 78 12 89 27 
Pivot Element|34] 


45, 27 spotted 34 26 1 45 18 78 12 89 27 


> — 
Exchange 45,27 e 26 1 27 18 78 12 89 45 
—> <— 
78, 12 spotted e 26 1 27 18 78 12 89 45 
= > <— ~<-- 
Exchange 78,12 e 26 1 27 18 12 78 89 45 
> «+ 
78, 12 spotted e 26 1 27 18 12 78 89 45 
- - J <-- 
-— > 
Call for Partition [12 26 1 27 18]@4) [78 89 45] 
Exchange 12 and 
pivot element 34 
—>: positions -- >! positions 
where the moves where the moves 
have presently stopped were last stopped 


Fig. 16.7 Partitioning a list (Example 16.12) 


Quick sort procedure 


Once the method behind partitioning is known, quick sort is nothing but repeated partitioning 
until every pivot element settles down to its appropriate position thereby sorting the list. 
Algorithm 16.8 illustrates the quick sort procedure. The algorithm employs the Divide and 
Conquer principle by exploiting procedure PARTITION ( Algorithm 16.7) to partition the list into 
two sublists and recursively calling procedure QUICK SORT to sort the two sublists. 
Procedure PARTITION partitions the list L[first:last] at the position loc where the pivot 
element settles down. 





The McGraw-Hill Companies 


Internal Sorting 413 


Algorithm 16.7: Procedure for Partition 
procedure PARTITION(L, first, last, loc ) 
7, Sijlplesecl ase) SS Ee 1 Se (tO. De Pare etoned 106 31s Tene 
position where the pivot element finally settles down*/ 


left = first; 
rigar —— Vase, 
a Sle onan ene |p /* set the pivot element to the first 


element in dist bB*/ 
while (left < right) do 


repeat 
left = left+1; /* pivot element moves left to right*/ 
until o ere aea inore; 
repeat 
right = right -l; /* pivot element moves right to left*/ 
until Mee ine || * S joibwoe ele, 
if (left < right) then swap(L[left], L[right]); /*arrows face each 
other*/ 
end 
loc = right 
Sigil ou Err ce.) 2 = SEP ume lane) /* arrows have crossed each other - exchange 
pivot element L[first] Waele in| re monaco 
end PARTITION. d 


Example 16.13 Let us quick sort the list Z = {5, 1, 26, 15, 76, 34, 15}. The various phases of 
the sorting process are shown in Fig. 16.8. When the partitioned sublists contain only one element 
then no sorting is done. Also in phase 4 of Fig. 16.8 observe how the pivot element 34 exchanges 
with itself. The final sorted list is {1, 5, 15, 15, 26, 34, 76}. 


Algorithm 16.8: Procedure for Quick Sort 


procedüre OUTCk SORT L firs lasi) 
/* Lffirst:last]/ is the unordered list of elements to be 
quick sorted. The call to the procedure to sort the 
stisig Ji Sa) wewnlel Ie OMA CIK TS OIRIE (lig ls cial) 7 


PF (irrst < last) then 


CFEARTITION( L A Lire lae loo A a EA BONO E e = Lae) EO 
subl Sto ele © Wee 
OUTE E ORM Ci arias wel oe—4 1), Pe OWES SOE “ae Sie il Sic 


Eare C lieve =i 7 


OUTE IC ORT( In koe iL, basic) § 7 (OCG eS Ole ene SU liste 
jo Ores Ril les ic | 7 


} 
end QUICK SORT. y 


Stability and performance analysis 


Quick sort is not a stable sort. During the partitioning process keys which are equal are subject 
to exchange and hence undergo changes in their relative orders of occurrence in the sorted list. 


The McGraw-Hill Companies 


414 Data Structures and Algorithms 


L: {5, 1, 26, 15, 76, 34, 15} 
Phase 1: Pivot element|_ 5] 


e 1 26 15 76 34 15 
— —> 


List Z after partition [1] (5) [26 15 76 34 15] 


Phase 2: List [1] needs no quick sort. 
Quick sort list [26 15 76 34 15] 


Pivot element 


e 15 76 34 #15 
—> <— 


e 15 15 34 76 
<_ — 


List L after partition 1) (5) [15 15] @6) [34 76] 


Phase 3: Quick sort list [15, 15] 
Pivot element: 


e 15 
—> 


<— 
List L after partition (1 ) (5) [15] (15) @6) [34 76] 


Phase 4: List [15] needs no quick sort 
Quick Sort [34, 76] 


Pivot element 


e 76 
<> 


List L after partition(1) (5) (15) (15) [76] 
The final sorted list: {1, 5, 15, 15, 26, 34, 76} 
Fig. 16.8 Snapshots of the quick sort process (Example 16.13) 


L: {51 52 53} 


Example 16.14 Let us quick sort the list L = { 51, 5?, 
Phasel: Pivot element 


5%} where the superscripts indicate the relative orders of 


their occurrence in the list. Figure 16.9 illustrates the sorting °F 
process. It can be easily seen that quick sort is not stable. "Py 


Quick sort reports a worst case performance when the EPE N 5 @) j 
list is already sorted in its ascending order (see Illustrative P © [5] [54] 
Problem 16.6). The worst case time complexity of the The final sorted list Z = {59 5! 53 
algorithm is given by O(n2). However, quick sort reports Quick sort is unstable 
a good average case complexity of O(n logn). Fig. 16.9 Stability of Quick Sort 





Heap Sort 16.8 


Heap sort is a sorting procedure belonging to the family of sorting by selection. This class of sorting 
algorithms is based on the principle of repeated selection of either the smallest or the largest 
key from the remaining elements of the unordered list and their inclusion in an output list. At 





The McGraw-Hill Companies 


Internal Sorting 415 


every pass of the sort, the smallest or the largest key is selected by a well devised method and 
added to the output list and when all the elements have been selected the output list yields the 
sorted list. 

Heap sort is built on a data structure called heap and hence the name heap sort. The heap data 
structure aids the selection of the largest (or smallest) key from the remaining elements of the list. 
Heap sort proceeds in two phases viz., 

(i) construction of a heap where the unordered list of elements to be sorted are converted into 
a heap, and 

(ii) repeated selection and inclusion of the root node key of the heap into the output list after 
reconstructing the remaining tree into a heap. 


Heap 


A heap is a complete binary tree in which each parent node u labeled by a key or element e(u) 
and its respective child nodes v, w labeled e(v), e(w) respectively are such that e(u) > e(v) and 
e(u) > e(w). Since the parent node keys are greater than or equal to their respective child node 
keys at each level, the key at the root node would turn out to be the largest amongst all the keys 
represented as a heap. 

It is also possible to define the heap such that the root holds the smallest key for which every 
parent node key should be less than or equal to that of its child nodes. However, by convention 
a heap sticks to the principle of the root holding the largest element. 


Example 16.15 The binary tree shown in Fig. 16.10(a) is a heap while that shown in 
Fig. 16.10(b) is not. 


fos 


(a) Heap (b) Non heap 
Fig. 16.10 An example heap and non heap 
It may be observed in Fig. 16.10(a) how each parent node key is greater than or equal to that 


of its child node keys. As a result the root represents the largest key in the heap. In contrast the 
non heap shown in Fig. 16.10(b) violates the above characteristics. 


Construction of heap 


Given an unordered list of elements it is essential that a heap is first constructed before heap sort 
works on it to yield the sorted list. Let L={K,,K,,K;,...K,} be the unordered list. The 
construction of the heap proceeds by inserting keys from Z one by one into an existing heap. 





The McGraw-Hill Companies 


416 Data Structures and Algorithms 


K, is inserted into the initially empty heap as its root. K, is inserted as the left child of K,. If 
the property of heap is violated then K, and K, swap positions to construct a heap out of 
themselves. Next K; is inserted as the right child of node K,. If K; violates the property of heap 
it swaps position with its parent K, and so on. 

In general, a key K; is inserted into the heap as the child of node 5 


4 following the principle 


of complete binary tree ( the parent of child 7 is given by g and the right and left child of 1 
is given by 21 and (2i+1) respectively). If the property of the heap is violated then it calls for a 
; ; 
parent and so on. In short a major adjustment across the tree may have to be carried out to 
reconstruct the heap. 

Though a heap is a binary tree, the principle of complete binary tree which it follows favors 
its representation as an array (see Sec. 8.5). The algorithms pertaining to heap and heap sort 
employ arrays for their implementation of heaps. 


swap between K; and K | which in turn may trigger further adjustments between s | and its 


Example 16.16 Let us construct a heap out of L = {D, B, G, E, A, H, C, F}. Figure 16.11 
illustrates the step by step process of insertion and heap reconstruction before the final heap is 
obtained. The adjustments made between the keys of the node during the heap reconstruction 
are shown in dotted lines. 


1234 5 6 7 8 
INSERT ELEMENT BEFORE HEAP AFTER HEAP 
RECONSTRUCTION RECONSTRUCTION 








The McGraw-Hill Companies 


Internal Sorting 417 


A 





Fig. 16.11 Construction of heap (Example 16.16) 


As mentioned earlier, for the implementation of the algorithm for the construction of a heap, it 
is convenient make use of an array representation. Thus if the list L = {D, B, G, E, A, H, C, F} 
shown in Example 16.16 is represented as an array then the same after construction of the heap 
would be as shown in Fig. 16.12. Algorithm 16.9 illustrates the procedure for inserting a key K 
(L[child_index]) into an existing heap L[1:child_index-1]. 


List L L 


as an array L[1 : 8] e. B|G|E|A|H|C|F | sii 
] [6] [7] 


before heap construction [1] [2] [3] [4] [5] [6] [7] [8] 


List L 
as an array L[1 : 8] 
after heap construction 





i 
[1] [2] [3] [4] [5] [6] [7] [$] 
Fig. 16.12 Array representation of a heap for the list L = { D, B, G, E, A, H, C, F} 





The McGraw-Hill Companies 


418 Data Structures and Algorithms 


Algorithm 16.9: Procedure for inserting a key into a heap 


procedure INSERT HEAP(L, child index ) 
jo Me EX Cleat eh Uncles | iS aul URL SAL Saye ai la 
L{child_ index] NEP E (eis aliavelll relator 
heap = false; 


7 AA denti Ey parene / 
while (not heap) and (child index >1) do 

ie “(Gh |POeneSinig iiaveles<)) <7 2h [eloauilicl siiarcle< ||) ten 7 neice e 
violated- swap 
parent and child 


l p index | 
Pore nende 5 — —— | ; 


Toar ea ieee) a a a e |) fp 
Sa e e a = ees incl 5 
a inae | l 
a 7 


Par enemies x = | 


end INSERT HEAP. 





To build a heap out of a list L[1 : n], each element beginning from L[2] to L[n] will have to be 
inserted one by one into the constructed heap. Algorithm 16.10 illustrates the procedure of 
constructing a heap out of L[1 : n]. Illustrative Problem 16.8 illustrates the trace of the algorithm 
for the construction of a heap given a list of elements. 


Algorithm 16.10: Procedure for construction of heap 


procedure CONSTRUCT HEAP(L, n) 
7 7 i i o/s a SIS EQNEO be construc ee Into ar heap 


form malde f= Oi (ole: 
ME ERTENEA R  Clloal Nel aLinvelens | f /* insert elements one by one 
into the heap*/ 
end 


end CONSTRUCT HEAP. | p 


Heap sort procedure 


To sort an unordered list L ={K;, Kz, K3,...K,}, heap sort procedure first constructs a heap out 
of L. The root which holds the largest element of L swaps places with the largest numbered node 
of the tree. The largest numbered node is now disabled from further participation in the heap 
reconstruction process. This is akin to the highest key of the list getting included in the output 
list. Now the remaining tree with (n—1) active nodes is again reconstructed to form a heap. The 
root node now holds the next largest element of the list. The swapping of the root node with the 


The McGraw-Hill Companies 


Internal Sorting 419 


next largest numbered node in the tree which is disabled thereafter, yields a tree with (n—2) 
active nodes and so on. This process of heap reconstruction and outputting the root node to the 
output list continues until the tree is left with no active nodes. At this stage heap sort is done 
and the output list contains the elements in the sorted order. 


Example 16.17 Let us heap sort the list L = {D, B, G, E, A, H, C, F} made use of in 
Example 16.16. The first phase of heap sort is to construct a heap out of the list. The heap 
constructed for the list L is shown in Fig. 16.11. 

In the second stage the root node key is exchanged with the largest numbered node of the tree 
and the heap reconstruction of the remaining tree continues until the entire list is sorted. 
Figure 16.13 illustrates the second stage of heap sort. The disabled nodes of the tree are shown 
shaded in grey. After reconstruction of the heap the nodes are numbered to indicate the largest 
numbered node that is to be swapped with the root of the heap. The sorted list is obtained as 
L = {A, B, C, D, E, F, G, H}. 

List L: Ce be tk & | 
12345 6 7 8 
Initial heap 


l 





OUTPUT Before reconstruction After reconstruction 
of heap 


É O 





420 





The McGraw-Hill Companies 


Data Structures and Algorithms 








The McGraw-Hill Companies 


Internal Sorting 421 


Í O 
(B) (a) Heap sort complete 
L= ABCDEFGH} 


Fig. 16.13 Heap sorting of the list L = { D, B, G, E, A, H, C, F} (Example 16.13) 


Algorithm 16.11 illustrates the heap sort procedure. The procedure CONSTRUCT_HEAP builds the 
initial heap out of the list L given as input. RECONSTRUCT_HEAP reconstructs the heap after the root 
node and the largest numbered node have been swapped. Procedure HEAP SORT accepts the list 


L[1 : n] as input and returns the output sorted list in L itself. 


Algorithm 16.11: Procedures for Heap Sort 


procedure HEAP SORT(L, n) 
/* Lfl:n] is the unordered list to be sorted. The output list 
is returned in L itself*/ 
CONSTRUCT HEAR(L wae; a e a | Cleo a a seep a (OF Mh hai a7 
BUTE Dr Eine ea), i ou pU FOOE MOC e and r eoon rece ween 
end HEAP SORT. 


procedure BUILD TREE (L, n) 
for end node indexi — la. COn 2 step -I do 
{ 
swap iL], lieecl inecle aimclex|), /* suelo ioe merci walicin wine ~ leuccesic 
numbered node (end node) */ 
INOOMSIESUICHE UGE (1h, Gicl imocks iinGlex) 7 a  j06i 
reconstructing a heap / 


} 
end BUILD TREE. 


procedure RECONSTRUCT HEAP(L, end node index ) 
heap = false; 

parenti geen = ley 

IMAC e = JDEMSSINE INC Sx 2 


while (not heap) and (child index < end node index) do 
aojo Elia de = WG oe == Sn lel Winclone Ge le 
IN a Clase inclesx << eiacl imecle suimclesx) [L CHOOS eww che 1s 
the child nodes are greater 
than or equal to the parent*/ 


then 
a8) (ily e e ela ike) ) aivelens || 2 th Pelnia vel avinele>< || ) 
then Glial We = ee lone, Sia Wel inl: 
a5 (Es [Slava cl sbhigyole e JG [joeacme — LiCl) ) 





The McGraw Hill Companies 


422 Data Structures and Algorithms 
then Swap m Oke nic inicdesd| ileal sem bain cle <n)a; 
eevee iacle< =~ Cla Ivel incense, 
SMM AIL NICS a aa NaCl a 
j 
else heap = true; 
end 


end RECONSTRUCT HEAP. | > 


Stability and performance comparison 


Heap sort is an unstable sort (see Illustrative Problem 16.9). The time complexity of heap sort is 
O(n logn). 


Radix Sort 16.9 





Radix Sort belongs to the family of sorting by distribution where keys are repeatedly 
distributed into groups or classes based on the digits or the characters forming the key until the 
entire list at the end of a distribution phase gets sorted. For a long time this sorting procedure 
was used to sort punched cards. Radix sort is also known as bin sort or bucket sort or digital 
sort. 


Radix sort method 


Given a list L of n number of keys where each key K is made up of l digits, K ={K; K, K; ...K;}, 
radix sort undertakes sorting by distributing the keys based on the digits forming the key. If the 
distribution proceeds from the least significant digit (LSD) onwards and progresses left digit 
after digit, then it is termed LSD first sort. We illustrate LSD first sort in this section. 

Let us consider the case of LSD first sort of the list L of n keys each comprising l digits (i.e.) 
K ={K; K, K; ...K,} where each k; is such that 0 < k, < r. Here r is termed as the radix of the key 
representation and hence the name radix sort. Thus if L were to deal with decimal keys then the 
radix would be 10. If the keys were to be octal the radix would be 8 and if they were to be 
hexadecimal it would be 16 and so on. 

In order to understand the distribution passes of the LSD first sort procedure, we assume that 
r bins corresponding to the radix of the keys are present. In the first pass of the sort, all the keys 
of the list L, based on the value of their last digit, viz., k, are thrown into their respective bins. 
At the end of the distribution, the keys are collected in order from each of the bins. At this stage 
the keys are said to have been sorted based on their LSD. In the second pass we undertake a 
similar distribution of the keys throwing them into the bins based on their next digit, k_,. 
Collecting them in order from the bins yields the keys sorted according to their last but one 
digit. The distribution continues for l passes at the end of which the entire list L is obtained 
sorted. 





The McGraw-Hill Companies 


Internal Sorting 423 


Example 16.18 Consider a list L = { 387, 690, 234, 435, 567, 123, 441}. Here, the number of 
elements n = 7, the number of digits l = 3 and radix r = 10. This means that radix sort would 
require 10 bins and would complete the sorting in 3 passes. 

Figure 16.14 illustrates the passes of radix sort over the list. It is assumed that each key is 
thrown into the bin face down. At the end of each pass, when the keys are collected from each 
bin in order, the list of keys in each bin are turned upside down to be appended to the output 
list. 


L = {387, 690, 234, 435, 567, 123, 441} 
Pass | 


—— 3 a yy E an a 
JG 
690 441 123 234 435 387 
0 l 2 3 4 5 6 7 i 3 
L = {690, 441, 123, 234, 435, 387, 567} 


Pass 2 


CO EA C E E | ae 
435 
123 234 44] 567 387 690 
0 l 2 3 4 5 6 y 8 9 
L = {123, 234, 435, 441, 567, 387, 690} 


Pass 3 


E Ag I . JCE FECE SF ee A 
441 
123 234 387 435 567 
0 | 2 3 4 5 6 7 8 9 


L = {123, 234,387, 435, 441, 567, 690} 
Radix Sort Complete 


Fig. 16.14 Radix Sort (Example 16.18) 


During the implementation of the radix sort procedure in the computer, it is convenient to 
make use of linked lists for the representation of the bins. The linked list implementation of the 
sort for the list shown in Example 16.18, is illustrated in Fig. 16.15. Here the bins are implemented 
as an array of head nodes (shaded in grey). Each of the headed linked lists representing the bins 





The McGraw-Hill Companies 


424 Data Structures and Algorithms 


Pass 1 





List L = {690, 441, 123, 234, 435, 387, 567} List L = {123, 234, 435, 441, 567, 387, 690} 
Pass 3 





List L = {123, 234, 387, 435, 441, 567, 690} 
Fig. 16.15 Linked list implementation of radix sort (Example 16.18) 





The McGraw-Hill Companies 


Internal Sorting 425 


could be implemented as a linked queue with two pointers front and rear each pointing to the 
first and last node of the singly linked list respectively. At the end of each pass, the elements from 
each list could be appended to the output list by undertaking deletions in each of the linear 
queues representing the bins until they are empty. 

Algorithm 16.12 illustrates the skeletal procedure for the LSD first radix sort. 


Algorithm 16.12: Procedure for radix sort 


procedure RADIX SORT(L, n, rx, d ) 
rai corer cones man se a Bm a E CaCl COMmor usomaG, 
6) OHLE MICS {aL olay E 


Initialize each of the Q/0O:r-1] linked queues representing the bins to be 
empty; 


for i= d to 1 step -1 /* for each of the d passes over the listy 
SORE the Iistri or ma keys K = KEER k aoan Emo diga, 
ingering each orehe keo A Tno t hel inked aue ue TOI || 
OU fk, < =z; ) OUSeLiOUEce Senne keoys no O02 (ce —1) | based 


on the radix value of the digits*/ 


Delete the keys from the queues Q/0O:r-1] in order, and append 
the elements to the output list L; 


end 
return (L) ; 


end RADIX SORT. yá 


Most Significant Digit first sort Radix sort can also be undertaken by considering the most 
significant digits of the key first. The distribution proceeds from the most significant digit (MSD) 
of the key onwards and progresses right digit after digit. In such a case, the sort is termed MSD 
first sort. 


MSD first sort is similar to what happens in a post office during the sorting of letters. Using 
the pin code, the letters are first sorted into zones, for a zone into its appropriate states, for a 
state into its districts and so on until they are easy enough for efficient delivery to the respective 
neighborhoods. Similarly, MSD first sort, distributes the keys to the appropriate bins based on 
the MSD. If the sub pile in each bin is small enough then it is prudent to use a non radix sort 
method to sort each of the sub pile and gather them together. On the other hand, if the sub pile 
in each bin is not small enough, then each of the sub pile is once again radix sorted based on 
the second digit and so on until the entire list of keys gets sorted. 


Performance analysis 


The performance of the radix sort algorithm is given by O(d.(n+r)) where d is the number of 
passes made over the list of keys of size n and radix r. Each pass reports a time complexity of 
O(n+r) and therefore for d passes the time complexity is given by O(d.(n+r)). 





The McGraw-Hill Companies 


426 Data Structures and Algorithms 


O Summary 





> Sorting deals with the problem of arranging elements in a list according to the ascending 
or descending order. 

> Internal sort refers to sorting of lists or files that can be accommodated in the internal 
memory of the computer. On the other hand, external sorting deals with sorting of files or 
lists that are too huge to be accommodated in the internal memory of the computer and 
hence need to be stored in external storage devices such as disks or drums. 

> The internal sorting methods of bubble sort, insertion sort, selection sort merge sort, quick 
sort, shell sort, heap sort and radix sort are discussed. 

> Bubble sort belongs to the family of sorting by exchange or transposition. In each pass, 
the elements are compared pair wise until the largest key amongst the participating 
elements bubbles to the end of the list. 

> Insertion sort belongs to the family of sorting by insertion. The sorting method is based 
on the principle that a new key K is inserted at its appropriate position in an already 
sorted sub list. 

> Selection sort is built on the principle of selecting the minimum element of the list and 
exchanging it with the element in the first position of the list, and so on until the whole 
list gets sorted. 

> Merge sort belonging to the family of sorting by merge makes use of the principle of merge 
to sort an unordered list. The sorted sublists of the original lists are merged to obtain the 
final sorted list. 

> Shell sort divides the list into sub lists of elements making use of a sequence of increments 
and insertion sorts each sub list, before gathering them for the subsequent pass. The passes 
are repeated for each of the increment until the entire list gets sorted. 

> Quick sort belongs to the family of sorting by exchange. The procedure works on the 
principle of partitioning the unordered list into two sublists at every stage of the sorting 
process based on the pivot element and recursively quick sorting the sublists. 

> Heap sort belongs to the family of sorting by selection. It is based on the principle of 
repeated selection of either the smallest or the largest key from the remaining elements 
of the unordered list constructed as a heap, for inclusion in the output list. 

> Radix sort belongs to the family of sorting by distribution and is classified as LSD first sort 
and MSD first sort. The sorting of the keys is undertaken digit wise in d passes where d 
is the number of digits in the keys over a radix r. 


Illustrative Problems 


Problem 16.1 Trace bubble sort algorithm on the list L = {K, Q, A, N, C, A, P, T, V, B}. Verify 
stability of bubble sort over L. 


Solution: ‘The partially sorted lists at the end of the respective passes of the search is shown 
below. Repeated elements in the list have been superscripted with indexes. 





The McGraw-Hill Companies 


Internal Sorting 427 
Unsorted list { K, Q, Al, N, C, A2, B T, V, B} 
Pass 1 { K, Al, N, C, Æ, P Q, T, B, V} 
Pass 2 | Al, K, C, AZ N, P, Q, B, T, V) 
Pass 3 { Al, C, A2, K, N, P, B, Q,T, V} 
Pass 4 {Al A*, C, K,N,B,2Q,T, V} 
Pass 4 { Al, A2, C K,B,N,2PQ,T, V} 
Pass 5 { Al, A2, C, B,K N, B QT, V} 
Pass 6 { A1, A2 BBCGKN, BQT, V} 
Pass 7 { Al, A2, B,C, KN, B QT, V} 
Pass 8 { Al, A2, B,C, KN, B QT, V} 
Pass 9 {Al A*, B,C, K,N,2Q,T, V} 


Since the relative order of positions of equal keys remain unaffected even after the sort, bubble 
sort is stable over L. 


Problem 16.2 Trace the passes of insertion sort on the following lists: 
(i) {H, K, M, N, P} 
(ii) {P, N, M, K, H} 

Compare their performance in terms of the comparisons made. 


Solution: The lists at the end of each pass of insertion sort are shown in Table I 16.2. It may 
be observed that while list (i) is already in its ascending order, list (ii) is in its descending order. 
The sorted sublists are shown in brackets. The number of comparisons made in each of the passes 
is shown in bold. While list (i) needs to make a total of 4 comparisons, list (ii) needs to make a 
total of 10 comparisons to sort themselves using insertion sort. 


Table | 16.2 


Pass Insertion sort of Number of Insertion sort of Number of 
cere K, M, N, P} m {P, N, M, K, H} comparisons 


Ci acy wame | 


| @ | 
rn Poe 





Problem 16.3 Undertake 3-way merge for the lists shown below: 
4 ea: L, ={F, H, M, N, P}, L; = { G, M} 


Solution: The snapshots of the 3-way merge of the lists into the list L is shown in Fig. I 16.3. At 
every stage three elements from each of the lists are compared and the smallest of them is 
dropped into the list. At the end of step 5, L4 gets exhausted and at the end of step 6, L} also gets 
exhausted. In the last step, the remaining elements in list L, are merely flushed into list L. 


The McGraw-Hill Companies 





428 


0. Initialization 


~ 
~ 


ll P NM HF 


~ 
we 


“Rl 


1. Compare F, F, G 


a 


“Rl 


Ly PN MH 


~ 
te 


2. Compare J, H, G 


D 


I: 


L2 PNMH 


~ 
eo 


J 


os 
y |5 


G F F 


3 


3. Compare J, H, M 


Lyi E ok 
L| PNM 
L3| M 


4. Compare J, M, M 


i 
E 


Ss 
= > 
z 

I/II 


Nn 
O 
© 

3 
SS 

= 

iq’) 

~ 
= 
= 


l 


bd 3 


A 
Y 
= 
S 


+. 
Z 


Data Structures and Algorithms 


H GP -P 


J HGF PF 


LJ UGF F 


6. Compare M, M (Lı is exhausted) 


En 


N 


Lf 


on 
be 


MMLJHGFF 


j 


7. Lı, L3 exhausted. Flush Z3 into L 


En 


~ 
Us 


If 


Fig. I 16.3 


NMMLJIHGFF 


The McGraw-Hill Companies 


Internal Sorting 429 


Problem 16.4 Undertake non recursive merge sort for the list L = { 78, 78, 78, 1} and check 
for the stability of the sort. 


; . , L: 8! 82] [783] [1 
Solution: We undertake the non recursive formulation of merge Ped Wel REJ [H 


sort procedure for the list L = { 781, 782, 78°, 1}. The repeated keys 


in the list Z are superscripted to track their orders of occurrence in Pass 1: [78', 78°] [1, 78°] 

the list. Figure [16.4 shows the passes over the list L. The final a a 

sorted list verifies that merge sort is stable. Pa eho 10% TE 
Fig. | 16.4 


Problem 16.5 Solve the recurrence relation for the time complexity of merge sort given in 
Sec. 16.5 assuming the size of the list n = 2. 


Solution: ‘The recurrence relation is given by 
T(n) =2.T(2] +c.n, n22 
=d 
Solving the relation results in the following steps: 








T(n)=2.7(4)+e.n (i) 
22T njeg +c.n 
Gia 
=22 T A +2.c.n Gi) 
sT a +3.c.n (iii) f ‘A, B, N, M, P. R} 








Phase 1: Pivot element 
In step (k) T(n) is obtained as, 


List L e BNMPR 
_ 9k n after Partition: = 
T (n)=2 1(2 Jem @[BN M PR 


Phase 2: Pivot element 


=n.T(1)+logon.c.n (n=, k = logon) 
Ea y M P R 


=n.d +c.n.log,n 


-O(n.logzn) List L (A)(B)[N M PR] 


after Partition: 


Phase 3: Pivot element 
Problem 16.6 Quick sort the list L={A, B, N, M, P, 


. M PR 
R}. What are your observations? How can the eager 
observations help you in determining the worst case List Z @B® M P RI 


complexity of quick sort? afer Partion 


Solution: The quick sort process is demonstrated in In phases 4, 5 and 6 elements M, P and R 

Fig. I 16.6. Since the list is already in its ascending order, get freed due to partitioning yielding the sorted list 
during each phase of the sort, the elements in the order [ABNMPR] 

given get thrown out one by one during the subsequent Fig. | 16.6 





The McGraw-Hill Companies 


430 Data Structures and Algorithms 


partitions. In other words with each partition the size of the list decrements by 1. The quick sort 
procedure is therefore recursively called, for lists of sizes n, (n—1), (n—2)....3 2 1. Hence the worst 
case time complexity is given by O( (n) +(n—1)+(n-2)+.....3+2+1) = O(n’). 


Problem 16.7 Discuss a procedure to obtain the rank of an element K in an array LIST[1 : n]. 
How can procedure PARTITION (Algorithm 16.7) be effectively used for the same problem? What 
are the time complexities of the methods discussed? 


Solution: A direct method to obtain the rank of an element in an array LIST[1:n] is to sort the 
list and search for the element K in the sorted list. The time complexity of the procedure in such 
a case would be O(n log,n) since the best sorting algorithm reports a time complexity of O(n 
log,n). 

In the case of employing procedure PARTITION for the problem, K is first compared with the pivot 
element (P) that gets dropped off the list during the first partition. If K = P then the problem is 
done. The index of P in the list LIST would be the rank of the element K. On the other hand, if 
(K < P) or (K > P) then it would only call for searching for the rank of K in any one of the sublists 
occurring to the left or right of P by repeatedly invoking procedure PARTITION. Hence in this 
case, the time complexity would be O(n) in the worst case. 


Problem 16.8 Trace procedure CONSTRUCT HEAP (Algorithm 16.10) over the list L[1:5] = { 12, 
45, 21, 67, 34}. 


Solution: ‘Table I 16.8 illustrates the trace of the procedure CONSTRUCT HEAP which invokes 
procedure INSERT HEAP repeatedly for the elements belonging to L[2:5] = { 45, 21, 67, 34 }. At the 
end of the execution, procedure CONSTRUCT HEAP yields the list L[1:5] = {67, 45, 21, 12, 34} which 
is the heap. 


Table | 16.8 


ima 
ee S y mesaou 


INSERT HEAP 2 i! {45, 12, 21, 67, 34} | L[1]<L[2] swap 
(L[1], L[2]) done. 
{45, 12, 21, 67, 34} 


INSERT HEAP {45, 67, 21, 12, 34} | L[2]<L[4] swap 
(L[2], L[4]) done. 
{67, 45, 21, 12, 34} | L[1]<L[2] swap 
(L[1], L[2]) done. 
{67, 45, 21, 12, 34} 


INSERT HEAP (L, 5) G 45, 21, 12, 34} | L[2]>L[5] 
no swap 





















The McGraw-Hill Companies 
431 


Internal Sorting 


Problem 16.9 Test for the stability of heap sort on the list L = { 7!, 77, 7°}. 


Solution: Figure I 16.9 demonstrates the heap sort process on L. The final sorted list L = {73, 72, 
7t}. This verifies that heap sort is unstable. 


L= {7!, 72, 733 (A) 
Initial heap: 


Output Before reconstruction of heap After reconstruction of heap 


7 @ r) 


$ + (P) 
$ @ 
Heap sort complete 
: 7 L= {72, PB, 7} 


Fig. | 16.9 


Problem 16.10 Radix sort the list L = { 001, 101, 010, 000, 111, 110, 011, 100}. 


Solution: The list L commands the following parameters: n = 8, d = 3 and r = 2. The radix sort 
process is shown in Fig. I 16.10. 


Problem 16.11 Selection sort the list L = { H, V, A, T, L, M, K}. 
Solution: The sorting steps are shown below. The minimum element in each pass is shown in 
bold. The arrows indicate the swap of the minimum element with that in the first position of the 
sub list considered for the pass. The elements in gray are indicative of the exclusion of elements 


from the pass. 


Pass List L (During Pass) List L (After Pass) 

1 iy V A TPM K ‘A, V H, T, P M, K} 
Ka aT 

2 (A4 Vv H T BP M K ‘A, H, V T, PM, K 

h i 

3 A H VT PM K iA, H K T P M, V} 

4 A H, K T P M, V} {4 H, K M, PB T, V} 

5 {A, H, K, M, P, Ẹ V} {A J H J K J M J P, I; V} 

kA 
6 {A, H, K, M, P, j V} aP H J K, M J y I, V} 
rA 
Sorted list: {A, H, K M, BPT V 





The McGraw-Hill Companies 


432 Data Structures and Algorithms 


L= {001, 101, 010, 000, 111, 110, 011, 100} 
Phase 1: 


EE ie] om] Pel 
Pie) Phe] pie - 


List L = {010, 000, 110, 100, 001, 101, 111, 011} 
Phase 2: 


EE gd mioj Peai 
EE eel Peel l h 


L = {000, 100, 001, 101, 010, 110, 111, O11} 


Phase 3: 


EE i| iol o| Hen 
EE iol minl iel mi 


List L = {000, 001, 010, 011, 100, 101, 110, 111} 

The final radix sorted list 

L = {000, 001, 010, 011, 100, 101, 110, 111} 
Fig. | 16.11 


Problem 16.12 Test whether shell sort is stable on the list L = { 7, 51, 52, 59, 54, 5°, 56, 57 58 
5?}for a sequence of increments {4, 2, 1}. The repeated occurrences of element 5 have been 
superscripted with their orders of occurrence. 


Solution: The trace of shell sort on the list L is shown in Fig. I 16.12. It is unstable. 


Unordered list L: 

Ky K, K; Ky Ks K6 Ky Kg Kg Kyo 

7 51 52 53 54 59 56 57 58 59 
Pass 1 (increment h, = 4) 

(Ki Ks Kg ) (K Ke Kyo) (K3 K3) (Ky 

(Z 54 58) (51 59 5?) (52 56) (59 


After insertion sort: 
(54 58 7) (51 59) (52 
List L after Pass 2 








The McGraw-Hill Companies 


Internal Sorting 433 


Pass 2 (increment h; = 2) 
Ki K3 K; Kz Ko) (Ko Ky Ke 
(54 52 58 56 7) (51 53 59 
After insertion sort: 
(54 52 58 56 
List L after Pass 3: 


De za De bf 


Pass 3 (increment họ =1) 
Ke K3 
54 58 59 56 57 
After insertion sort: 
54 51 52 
Sorted List L 54 





Fig. | 16.12 


(@) Review Questions 


1. Which of the following is unstable sort? 


(a) Quick sort (b) Insertion sort (c) Bubble sort (d) Merge sort 
2. The worst case time complexity of quick sort is 

(a) O(n) (b) O(n?) (c) O(n.logn) (d) O(n’) 
3. Which among the following belongs to the family of sorting by selection? 

(a) merge sort (b) quick sort (c) heap sort (d) shell sort 


4. Which among the following actions does not occur during the 2 — way merge of two lists 
L, and L, into the output list L? 
(a) If both L; and L, get exhausted, then the merge is done 
(b) If L} gets exhausted before L,, then simply output the remaining elements of L, into 
L and the merge is done. 
(c) If L, gets exhausted before L}, then simply output the remaining elements of L, into 
L and the merge is done. 
(d) If one of the two lists (L4 or L,) gets exhausted, with the other still containing elements, 
then the merge is done. 
5. For a list L ={ 7, 3, 9, 1, 8} the output list at the end of Pass 1 of bubble sort would yield 
(a) {oy 7, 1, 9, 8} (b) {3, 7, 1,8,9} O B 1,-7.9, 3) (d) {1, 3, 7, 8, 9} 
6. Distinguish between internal sorting and external sorting. 
7. When is a sorting process said to be stable? 
8. Why is bubble sort stable? 
9. What is k-way merging? 
0. What is the time complexity of selection sort? 
11. Distinguish between a heap and a binary search tree. Give an example. 
12. What is the principle behind Shell sort? 


434 


13. 
14. 
15. 
16. 
T7. 


18. 


19. 


20. 


Ds 





The McGraw Hill Companies 


Data Structures and Algorithms 


When is radix sort termed LSD first sort? 

What is the principle behind the Quick sort procedure? 

What is the time complexity of merge sort? 

Can bubble sort ever perform better than quick sort? Is so, list a case. 

Trace (i) bubble sort (ii) insertion sort and (iii) selection sort on the list L ={H, V, A, X, G, 
ral 

Demonstrate 3-way merging on the lists: 

L, = {123, 678, 345, 225, 890, 345, 111}, L, = { 345, 123, 654, 789, 912, 144, 

267, 909, 111, 324} and L, = { 567, 222, 111, 900, 545, 897} 

Trace Quick sort on the list L = { 11, 34, 67, 78, 78, 78, 99}. What are your observations? 
Trace Radix sort on the following list: 

L = { 5678, 2341, 90, 3219, 7676, 8704, 4561, 5000} 

Undertake heap sort for the list L shown in Review Questions 17 (Chapter 16). 


(=) Programming Assignments 


Implement (i) Bubble sort and (ii) Insertion sort in a language of your choice. Test for the 
performance of the two algorithms on input files with elements already sorted in 
(i) descending order and (ii) ascending order. Record the number of comparisons. What are 
your observations? 

Implement Quick sort algorithm. Enhance the algorithm to test for its stability. 
Implement the non recursive version of merge sort. Enhance the implementation to test for 
its stability. 

Implement the LSD first and MSD first version of Radix sort for alphabetical keys. 
Implement Heap sort with the assumption that the smallest element of the list floats to the 
root during the construction of heap. 

Implement shell sort for a given sequence of increments. Display the output list at the end 
of each pass. 


The McGraw-Hill Companies 


CHAPTER 


EXTERNAL 
SORTING 





17.1 Introduction 
T E SEO 


7 devices 
Introduction 17.1 l 
17.3 Sorting with 


Internal sorting deals with the ordering of records (or keys) of a file tapes: Balanced 


(or list) in the ascending or descending order when the whole file ne 

or list is compact enough to be accommodated in the internal 17.4 Sorting with 

memory of the computer. Chapter 16 detailed internal sorting disks: Balanced 

techniques such as Bubble Sort, Insertion Sort, Selection sort, Merge merge 

Sort, Shell sort, Quick Sort, Heap Sort and Radix Sort. 17.5 Polyphase merge 
However, in many applications and problems it is quite common Sort 


to encounter huge files comprising millions of records which need 176 
to be sorted for their effective use in the application concerned. The l 
application domains of e-governance, digital library, search engines, 

on-line telephone directory and electoral system, to list a few, deal 

with voluminous files of records. 

Majority of the internal sorting techniques that we learned are virtually incapable of sorting 
large files since they require the whole file in the internal memory of the computer, which is 
impossible. Hence the need for external sorting methods which are exclusive strategies to sort 
huge files. 


Cascade merge 
Sort 


The principle behind external sorting 


Due to their large volume, the files are stored in external storage devices such as tapes, disks or 
drums. The external sorting strategies therefore need to take into consideration the kind of 
medium on which the files reside, since these influence their work strategy. 

The files residing on these external storage devices are read ‘piece meal’ since only that many 
records that can be accommodated in the internal memory of the computer, can be read at a time. 
These batches of records are sorted making use of any efficient internal sorting method. Each of 
the sorted batches of records are referred to as runs. The file is now viewed as a collection of runs. 
The runs, as and when they are generated, are written out onto the external storage devices. The 
variety in the external sorting methods for a particular storage device, is brought about only by 
the ways in which these runs are gathered and processed, before the final sorted file is obtained. 
However, majority of the popular external sorting methods make use of merge sort for gathering 
and processing the runs. 





The McGraw-Hill Companies 


436 Data Structures and Algorithms 


A common principle behind most popular external sorting methods is outlined below: 
(i) Internally sort batches of records from the source file to generate runs. Write out the runs 
as and when they are generated, onto the external storage device(s) . 
(ii) Merge the runs generated in the earlier phase, to obtain larger but fewer runs, and write 
them out onto to the external storage devices. 
(iii) Repeat the run generation and merge, until in the final phase only one run gets generated, 
on which the sorting of the file is done. 

Since external storage devices play an imminent role in external sorting, we discuss sorting 
methods as applicable to two popular storage devices, viz., magnetic tapes and magnetic disks, the 
latter commonly referred to as hard disks. The reason for the choice is that these devices are 
representative of two different genres and display different characteristics. While magnetic tapes 
are undoubtedly obsolete these days, it is worthwhile to go through the external sorting methods 
applicable on these devices, considering the amount of research efforts and innovation that had 
gone into them, during their ‘hey days’! 

The following section briefly discusses the external storage devices of magnetic tapes and 
disks. The external sorting method of balanced merge applicable to files stored on both tapes and 
disks is elaborately discussed. A crisp description of polyphase merge and cascade merge sort 
procedures is presented finally. 





External Storage Devices 17.2 


In this section we briefly explain the characteristics of magnetic tapes and magnetic disks. 


Magnetic tapes 


Magnetic tape is a sequential device whose principle is similar to that of an audio tape /cassette 
device. It consists of a reel of magnetic tape, approximately 42” wide and wound round a spool. 
Data is stored on the tape using the principle of magnetization. Each tape has about 7 or 9 tracks 
running lengthwise. A spot on the tape represents a 0 or 1 bit depending on the direction of 
magnetization. A combination of bits on the tracks, at any point along the length of the tape, 
represents a character. The number of bits per inch that can be written on the tape is known as 
tape density and is expressed as bpi (bits per inch). 

Magnetic tapes with densities of 800 bpi and 1600 bpi Take up Source 
were in common use during the earlier days. reel reel 

The magnetic tape device consists of two spindles. Saiwa 
While one spindle holds the source reel, the other holds tape 
the take up reel. During a forward read/write operation, movement 
the tape moves from the source reel to the take up reel. 
Fig. 17.1 illustrates a schematic diagram of the magnetic 
tape drive. 

The data to be stored on a tape is written on to it in 
blocks. These blocks may be of fixed or variable size. A 
gap of 3⁄4” is left between the blocks and is referred to 
as Inter Block Gap (IBG). The IBG is long enough to 
permit the tape accelerate from rest to reach its normal _ a 
speed before it begins to read the next block. Figure 17.2 Fig. 17.1 Schematic diagr am of a 
shows the IBG of a tape. magnetic tape drive 





Read head Write head 





The McGraw-Hill Companies 


External Sorting 437 


Blocks of data 








Magnetic tape 





H 


Inter block gaps 
Fig. 17.2 Inter Block Gap of a tape 


Magnetic tape is a sequential device since having read a block of data, if one desires to read 
another block that is several feet down the tape, then it is essential to fast forward the tape until 
the correct block is reached. Again if we desire to read blocks of data that occur towards the 
beginning of the tape, then it is essential that the tape is rewound and the reading starts from the 
beginning onwards. In these aspects the characteristic of tapes is similar to that of audio cassettes. 


Magnetic disks 


Magnetic disks are still in vogue and are commonly referred to as hard disks, these days. Hard disks 
are random access storage devices. This means that hard disks store data in such a manner that 
they permit both sequential access as well as random or direct access of data. 

A disk pack is mountable on a disk drive and comprises of platters which are similar to 
phonograph records. The number of platters in a disk pack varies according to its capacity. 
Figure 17.3 shows a schematic diagram of a disk pack comprising 6 platters. 

Recording of data is done on all surfaces of the platters except the outer surfaces of the first and 
last platter. Thus for a 6-platter disk pack, of the 12 surfaces available, data recording is done only 
on 10 of the surfaces. Each surface is accessed by a read/write head. The access assembly comprises 
of an assembly of access arms ending in the read/write head. The access assembly moves in and 
out together with the access arms, so that all the read/write heads at any point of time are 
stationed at the same position on the surface. During a read/write operation, the read/write head 
is held stationery over the appropriate position on the surface, while the disk rotates at high 
speed to enable the read/write operation. Disk speeds ranging from 3000 rpm to 4900 rpm are 
common these days. 

Each surface of the platter, like a phonograph record, is made up of concentric circles of tracks 
of decreasing radii, on which the data is recorded. Modern versions of the hard disk contain tens 
of thousands of tracks per surface. The tracks are numbered from 0 beginning from the outer edge 
of the platter. The collection of tracks of the same radii, occurring on all the surfaces of the disk 
pack, is referred to as a cylinder (Refer Fig. 17.3). Thus a disk pack is virtually viewed as a 
collection of cylinders of decreasing radii. Each track is divided into sectors which is the smallest 
addressable segment of a track. Typically a sector can hold 512 bytes of data approximately. The 
early disk packs had all tracks holding the same number of sectors. The modern versions have 
however rid themselves off this feature to increase the storage capacity of the disk. 

To access information on a disk, it is essential to first specify the cylinder number, followed by 
the track number and the sector number. A multilevel index based ISAM file organization 





The McGraw-Hill Companies 


438 Data Structures and Algorithms 


Access 
assembly l 


Cb > Platter 
ss 
y 
ee Tracks 


e K Cylinders 


Access Spindle 
arm 


Read/write head 
Fig. 17.3 Schematic diagram of a disk pack 


(see Sec. 15.7) is adopted for obtaining the physical locations of records stored in the disk. The 
cylinder index records the highest key in each cylinder and the cylinder number. The surface index 
or the track index stores the highest key in each track and the track number. Finally the sector index 
records the highest key in each sector and the sector number. In practice, each of the index entries 
also contain other spatial information to help locate the records efficiently. Thus the cylinder, 
track and sector indexes form a hierarchy of indexes which help identify the physical location of 
the record. 

The read/write head moves across the cylinders to position itself on the right cylinder. The 
time taken to position the read/write head on the correct cylinder is known as seek time. Once the 
read/write head has positioned itself on the correct track of the cylinder, it has to wait for the 
right sector in the track to appear under the corresponding read/write head. The time taken for 
the right sector to appear under the read/write head is known as latency time or rotational delay. 
Once the sector is reached, the corresponding data are read or written on to the disk. The time 
taken for the transfer of data to and from the disk is known as data transmission time. 


Sorting with Tapes: Balanced Merge 





Balanced merge sort makes use of an internal sorting technique to generate the runs and employs 
merging to gather the runs for the next phase of the sorting. The repeated run generation and 
merging continue until a single run generated in the final phase delivers the sorted file. In this 





The McGraw-Hill Companies 


External Sorting 439 


section we discuss balanced merge when the file resides on a tape. Besides the input tape, the 
sorting method has to make use of a few more work tapes to hold the runs that are generated 
from time to time and to perform the merging of the runs as well. Example 17.1 illustrates 
balanced merge sort on tapes. The sorting method makes use of 2-way merge to gather the runs. 


Example 17.1 Let us suppose we had to sort a file of 50, 000 records (R,, Ry, Ra ..--Rsggo0) 
which is available on a tape (Tape Tọ) using balanced 2-way merge sort. Assume that the internal 
memory can hold only 10,000 records. Also let us suppose that there are 4 work tapes (Ty, T>, T3, 
T4) available to assist in the sorting process. Rij indicates the jth run in the ith phase of the sorting. 
T indicates the read/write head position on the tape. The steps in the sorting process are listed 
below: 

Step 1: Rewind all tapes and mount tapes Ty T}, and T, onto the tape drive. 

Step 2: Phase 1: Read blocks of 10,000 records each from tape To and internally sort them to 
generate runs. Let R11 (R1----Rio000 ) Riz (Ri0001----R20000) Ris (R20001-++-Rso000 ) Kia 
(Rag991-++-Ragoo9) and Ris (R40001----R50000) be the five runs that are to be generated. 
Distribute the runs alternately onto tapes T} and T,. The distribution of runs on the 
tapes I’, and T, are as shown below: 


Tape Ti | Ry..--Ryoo99 R5o901-+--®30000 Raooo1-+--®s50000 


Tape T | Rio001:---R20000 | R30001----R40000 ee 


Step 3: Dismount tape Tọ and rewind tapes T} and T,. Mount tapes T}, T>, T3, T} onto the 
drives. Here T}, T, are the input tapes and T3, T} are the output tapes. 

Step 4: Phase 2: Merge runs on tapes T, and T, using a 2-way merge to obtain longer runs 
R1(Ry--+-Rogggg), Roo (Rooo91-+++Ragog9) and Rəs ( R40001----R50000 ): The distribution of 
runs on the output tapes T}, T, are shown below. Note that run R», is simply copied 
onto tape T} and is just a dummy run. 


Tape T4 | Roogo1 ---- Kaoooo 





Step 5: Rewind all tapes T,, T,, T3, T,. Mount T, T, as the input tapes and T} , T, as the output 
tapes. 

Step 6: Phase 3: Merge runs on tapes T, and T, using a 2-way merge to obtain runs 
R34(R4----R40000) and Ra (Raggg1--+-Rsqqq9): The distribution of runs on the output tapes 
T,, T, are shown below. Note that run R,, is simply copied onto tape T, and is just a 
dummy run 


Tape Ti | Ry... R40000 


| 


Tape T, | R40001 ..... R50000 





The McGraw-Hill Companies 


440 Data Structures and Algorithms 


Step 7: Rewind all tapes T,, T,, T}, T}. Mount T}, T, as the input tapes and T, as the output 
tape. 

Step 8: Phase 4: Merge runs on tapes T} and T, using a 2-way merge to obtain the final run 
R4(R4----R50000)- The final run is written onto tape T} 


Tape Ts [Ri Reo | 


Buffer handling 


While merging runs in the balanced merge sort procedure, it needs to be observed that due to the 
limited capacity of the internal memory of the computer, it is not always possible to completely 
accommodate the runs and the merged list in it. In fact the problem gets severe as the phases in 
the sort procedure progress, since the runs get longer and longer. 

To tackle this problem, in the case of 2-way merge let us say, we trifurcate the internal memory 
into blocks known as buffers. Two of these blocks will be used as input buffers and the third as the 
output buffer. During the merge of two runs R} and R,, for example, as many records as can be 
accommodated in the two input buffers are read from the runs R} and R, respectively. The 
merged records are sent to the output buffer. Once the output buffer is full, the records are 
written on to the disk. If during the merging process, any of the input buffers gets empty, it is 
once again filled with the rest of the records from the runs. 


Example 17.2 Let us consider the merge of two runs R} and R, each of which holds 500 
records. The output run would contain 1000 records after merging. Let us suppose the internal 
memory of the computer can hold only 750 records. To undertake merging we divide the internal 
memory into two input buffers and an output buffer, each of which can hold 250 records each. 
The merge process is shown below: 





Ry Ry 
Input runs 1-250 251-500 1-250 | 251-500 

y y 

250 250 250 
Input buffers Output buffer 
y 
Merged run 1-250 251-500 501 - 750 751 - 1000 
1000 


The input buffers read in 250 records each, from the two runs R, and R, respectively. The merging 
which yields 500 records is emptied by the output buffer into the disk as two blocks of 250 
records each. The final merged run which contains 1000 records is in fact a collection of 4 blocks 
of merged records each containing 250 records. 

Example 17.2 presented a simple view of buffer handling. In reality, issues such as proper 
choice of buffer lengths, efficient utilization of program buffers to enable maximum overlapping 
of input/output and CPU processing times need to be attended to. 


The McGraw Hill Companies 


External Sorting 441 


Balanced P — way merging on tapes 


In the case of a balanced 2-way merge, if M runs were produced in the internal sorting phase and 


if 2°1<M<2* then the sort procedure makes k =| log, M | merging passes over the data 


records. 

Now balanced merging can easily be generalized to the inclusion of T tapes, T 2 3. We divide 
the tapes T into two groups, with P tapes on the one side and (T-P) tapes on the other, where 
1< P <T. The initial runs generated after internal sorting are evenly distributed on to the P tapes 
in the first group. A P-way merge is under taken and the resulting runs are evenly distributed 
on to the next group containing (T-P) tapes. This is followed by a (T-P) merge of the runs 
available on the (T-P) tapes, with the output runs getting evenly distributed on to the P tapes of 


the first group and so on. However, it has been proved that P = Z is the best choice. Illustrative 


Problem 17.4 discusses an example. Though balanced merging can be quite simple in its 
implementation, it needs to be seen if better merging patterns which save on time and resource 
can be evolved for the specific cases in hand. Illustrative Problems 17.5 and 17.6 discuss cases. 





Sorting with Disks: Balanced Merge 


Tapes being sequential access devices, the balanced merge sort methods had to employ sizable 
resources for the efficient distribution of runs besides spending time for mounting, dismounting 
and rewinding tapes. In the case of disks which are random access storage devices, we are spared 
of this burden. The seek time and latency time to access blocks of data from a disk is 
comparatively negligible, to the time taken to access blocks of data on tapes. 

The balanced merge sort procedure for disk files, though similar in principle to that of tape 
files, is a lot simpler. The runs generated by the internal sorting methods are repeatedly merged 
until a single run emerges with the entire file sorted in the final pass. Example 17.3 demonstrates 
balanced merge sort on a disk file. 


Example 17.3 Let us suppose a file comprising 4500 records (R4, R», Ra, ...-R4500) is available 
on a disk. The internal memory of the computer can accommodate only 750 records. Another disk 
is available as a scratch pad. The input disk is not to be written on. Making use of buffer handling, 
we presume that during internal sorting as well as merging, blocks of data comprising 250 
records each are read/written. Rij indicates the j* run generated in the i. pass. The steps involved 
in undertaking balanced 2-way merge for sorting the file are shown below: 

Step 1: Read three blocks of data (totally 750 records) at a time from the file residing on the 
disk. Internally sort the blocks in the internal memory of the computer to generate 6 
runs viz., R01, R02, R03, R04, R05, R06. Write the runs onto the scratch disk. 

Step 2: Trifurcate the internal memory into two input buffers and a single output buffer each 
capable of holding 250 records. 

Step 3: Read runs from the disk and merge them pair wise, appropriately making use of 
buffer handling during the merging process and write the output runs onto the scratch 
disk. 

Step 4: Repeat step 3 until a single run emerges, holding the entire sorted file. The merging 
passes are schematically shown in Fig. 17.4. 





The McGraw-Hill Companies 


442 Data Structures and Algorithms 


ROI R02 R03 R04 ROS R06 
Initial 
generation 


of runs < 750 records 


pi Soh ah a 
merge 
<— 6 blocks 1500 records —» 

\ Pi R21 | ez 
mre [[I[[ITIITITT] oe 
merge 

<— 12 blocks 3000 records —> 6 blocks 1500 records 

N R31 Pd 
meow PTET TTT ITT ITT tt TT I 
merge 
— 18 blocks 4500 records aye 


Fig. 17.4 Balanced merge sort: Merging the runs (Example 17.3) 


Balanced k-way merging on disks 


As discussed in Sec. 17.3 balanced 2-way merge sort can be generalized to k-way merging. For 
a 2-way merge, as can be deduced from Fig. 17.4, the number of passes over data is given by 


| log, M ] where M is the number of runs in the first level of the merge tree. A higher order merge 
can serve to reduce the number of passes over data. Thus in the case of k-way merge, k = 2, the 


number of passes is given by | log k M | , Where M is the number of runs. Figure 17.5 shows the 
merge tree for k = 4, for an initial generation of 16 runs in a specific case. 


Initia) ROl R02 R03 R04 ROS R06 R07 ROS ROO RIO RII R12 RIS RIS AIS: KIO 


sa It) O 


= OO OC ajala 
MS Ai S e A e AA m 


S ~ 


Pass 2 





Fig. 17.5 Balanced k-way merge sort: Merging the runs for k = 4 


Though k-way merge can significantly reduce input / output due to the reduction in the 
number of passes, it is not without its ill effects. Let us suppose R4, R,, Rs, ....R;, are the k runs 
generated initially with size r, 1 < i < k. During a k-way merge the next record which is to be 
output is the one with the smallest key. A direct method to find the smallest key would call for 


(k-1) comparisons. The computing time to merge the k runs would be given by O((k —-1). 5 r, ). 
i=1 





The McGraw-Hill Companies 


External Sorting 443 


Since | log, M| passes are being made, the total number of key comparisons is given by 
n(k-1)log, M, where n is the total number of records in the source file. We have 


log) M 
O 


n(k —1)log, M=n(k-1) l 





. In other words for a k-way merge sort, the number of key 
82 





l - . Thus for large k(k 2 6) the CPU time needed to 
082 
perform the k-way merge will overweigh the reduction achieved in input/output time due to the 
reduction in the number of passes. A significant reduction in the number of comparisons to find 
the smallest key can be achieved by using what is known as a selection tree. 


comparisons increases by a factor of 


Selection tree 


A selection tree is a complete binary tree which serves to obtain the smallest key from among a 
set of keys. Each internal node represents the smaller of its two children and external nodes 
represent the keys from which the selection of the smallest key needs to be made. The root node 
represents the smallest key that was selected. 

Figure 17.6 (a) represents a selection tree for an 8-way merge. The eight lists to be merged are 
L,(65, 7, 8), L,(6, 9, 9), L3(2, 4, 5), L4(1, 7, 8), Ls(3, 6, 9), L¢(5, 5, 6), L-(3, 4, 9), Le(6, 8, 9). The external 
nodes represent the first set of 8 keys that were selected from the lists. Progressing from the 
bottom up, each of the internal node represents the smaller key of its two children until at the 
root node the smallest key gets automatically represented. The construction of the selection tree 
can be compared to a tournament being played with each of the internal nodes recording the 
winners of the individual matches. The final winner is registered by the root node. A selection 
tree therefore, is also referred to as a tree of winners. 

In this case, the smallest key viz., 1 is dropped into the output list. Now the next key from L, 
viz., 7 enters the external node. It is now essential to restructure the tree to determine the next 
winner. Observe how it is now sufficient to restructure only that portion of the tree occurring 
along the path from the node numbered 11 to the root node. The revised key values of the internal 
nodes are shown in Fig. 17.6(b). Note how in this case, the keys compared and revised along the 
path are (2, 7), (5,2), (2,3). The root node now represents 2 which is the next smallest key. 

In practice, the external nodes of the selection tree are represented by the records and the 
internal nodes are only pointers to the records which are winners. (For ease of understanding the 
internal nodes in Fig. 17.6 were represented using the keys themselves, though in reality they are 
only pointers to the winning records) 

Despite its merits a selection tree can result in increased overheads associated with 
maintaining the tree. This happens especially when the restructuring of the tree takes place to 
determine the next winner. It can be seen that the when the next key walks into the tree, 
tournaments have to be played between sibling nodes who were losers earlier. 

Note how in the case of 7 entering the tree, the tournaments played were between (2, 7), (5,2) 
and (2,3), where 2, 5 and 3 were losers in the earlier case. It would therefore be prudent if the 
internal nodes could represent the losers rather than the winners. A tournament tree in which 
each internal node retains a pointer to the loser is called a tree of losers. 





The McGraw-Hill Companies 


444 Data Structures and Algorithms 





(a) The smallest key (key 1) is the winner 






(1) Winner 
<— 


(3) 


(13) (14) (15) 
S 6 & J y 
j BE , 9 9 9 
7 4 6 5 4 8 
8 5 9 6 9 9 

L 


Lı 


_ 
~~ — 
— — — á — á á 


a0) (11) 


OO Oo 


(b) Restructuring ie tree to ‘dees the next winner ae 7 


Fig. 17.6 Selection tree for an 8-way merge 


Figure 17.7 (a) illustrates the tree of losers for the selection tree discussed in Fig. 17.6. Node 
0 is a special node which shows the winner. As said earlier, each of the internal nodes is shown 
carrying the key when in reality they represent only pointers to the loser records. To determine 
the smallest key, as before, a tournament is played between pairs of external nodes. Though the 
winners are ‘remembered’, it is the losers that the internal nodes are made to point to. Thus nodes 
numbered ( (4), (5), (6), (7)) record pointers to the losing external nodes viz., the ones with the 
key values of 6, 2, 5, 6 respectively. Now node numbered (2) conducts a tournament between the 
two winners of the earlier game viz., key values 5 and 1 and records the pointer to the loser which 
is 5. In a similar way, node numbered (3) records the pointer to the loser node with key value 3. 
Progressing in this way the tree of losers is constructed and node 0 outputs the winning key value 
which is the smallest. 

Once the smallest key viz., 1 has been output and the next key 7 enters the tree, the 
restructuring is easier now, since the sibling nodes with which the tournaments are to be played 
are losers and these are directly pointed to by the internal nodes. The restructured tree is shown 
in Fig. 17.7(b). 





The McGraw-Hill Companies 


External Sorting 445 








(5) smallest 





(6) 


(a) The smallest key (key 1) is output 


Node 0 (2) 
(1) 
@) 





oOo nN 


©) O © 
7 9 + 8 5 + 8 
8 9 5 6 9 9 


(b) Restructured tree after the smallest key is output 


Fig. 17.7 Tree of losers for the 8-way merge 


Polyphase Merge Sort 17.5 





Balanced k-way merge sort on tapes calls for an even distribution of runs on the tapes and to 
enable efficient merging requires 2k tapes to avoid wasteful passes over data. Thus while k tapes 
act as input devices holding the runs generated, the other k tapes act as output devices to receive 
the merged runs. The k tape groups swap roles in the successive passes until a single run emerges 
in one of the tapes, signaling the end of sort. 

It is possible to avoid wasteful redistribution of runs on the tapes while using less than 2k tapes 
by a wisely thought out run redistribution strategy. Polyphase merge is one such external sorting 
method that makes use of an intelligent redistribution of runs during merging, so much that a 
k-way merge requires only (k+1) tapes! 





The McGraw-Hill Companies 


446 Data Structures and Algorithms 


The central principle of the method is to ensure that in each pass (except the last of course!) 
during the merge, the runs are to be cleverly distributed so that one tape is always rendered 
empty while the other k tapes hold the input runs that are to be merged! The empty tape for the 
current pass acts as the output tape for the next pass and so on. Ultimately, as in balanced merge 
sort, the final pass delivers only one run in one of the tapes. 

At this point of time we introduce a useful notation mentioned in the literature to enable a 
crisp presentation of run distribution. Runs that are initially generated by internal sorting are 
thought to be of length 1 (unit of measure). Thus if there are t runs that are initially generated 
then the notation would describe it as 1t. For example, if there were 34 runs that were initially 
generated then it would be represented as 1% . Similarly, if after a merge there were 14 runs of 
size 2, it would be represented as 2!* . In general, t runs of size s would be represented as st. 

Example 17. 4 illustrates polyphase merge on 3 tapes. 


Example 17.4 Let us suppose a source file was initially sorted to generate 34 runs of size 
1 (154). We demonstrate polyphase merge on 3 tapes (T, T», T3) undertaking a 2-way merge 
during each phase. Table 17.1 shows the redistribution of runs on the tapes in each phase. 


Table 17.1 Polyphase merge on 3 tapes: redistribution of runs 


i E S E E 


Merge to T} 
Merge to T, 


Merge to T, 


Merge to T} 
Merge to T, 


Note how in phase 8, polyphase merge successfully completes its sorting by creating the final run 
of sorted records. Also observe how in each phase one of the tapes is rendered empty while the 
other two are non empty. Now what is the trick behind this procedure? 

Let us suppose that ‘intuitively’ we decided to distribute 13 runs of size 1 and 21 runs of size 
1 onto tapes T, and T, respectively. In phase 2, because it is a 2-way merge and polyphase merge 
expects one tape to fall vacant in every phase, we use up all the 13 runs of size 1 in tape T} for 
a merge operation with an equivalent number of runs in tape T,. This yields 13 runs of double 
the size (2!) which is written on to the empty tape T}. That leaves 8 runs of size 1 on tape T, 
that could not be used up and renders tape T} empty. Again in phase 3, 18 runs in tape T, are 
merged with an equivalent amount of runs in tape T, to obtain 3° which is written on to tape T}. 
This leaves a balance of 2° runs on tape T, and renders tape T, empty. The phases continue until 
in phase 8 a single run 34! gets written on to tape T}. 

To determine how the initial distribution of 119° and 12! was conceived, we work backwards 
from the last phase. Let us suppose there were n phases for a 3- tape case. In the n® phase, we 








The McGraw-Hill Companies 


External Sorting 447 


should arrive at exactly one run on a tape T} (let us say) with tapes T, and T, totally empty. This 
implies that in phase (n—1) there should have been two runs of size 1 on tapes T, and T} which 
should have been merged as a single run on T, in the n phase. Continuing in this fashion we 
obtain the initial distribution of runs to be 1/8 and 1?! on the two tapes respectively. Table 17.2 
lists the run distribution for a 3-tape polyphase merge. 


Table 17.2 Run distribution for a 3 - tape polyphase merge 





It can be easily seen that the number of runs needed for an n-phase merge is given by F, + Fp 


where F; is the i!" Fibonacci number. Hence this method of redistribution of runs is known as 
Fibonacci merge. The method can be clearly generalized to k-way merging on (k+1) tapes using 
generalized Fibonacci numbers. 


Cascade Merge Sort 17.6 


Cascade merge is another intelligent merge pattern that was discovered before polyphase merge. 
The merge pattern makes use of a perfectly devised initial distribution of runs on the tapes. While 
the polyphase merge sort employs a uniform merge pattern during the run generation, cascade 
merge sort makes use of a ‘cascading’ merge pattern in each of its passes. Thus for t tapes, while 
polyphase merge uniformly employs a (t-1) merge for the run generation, cascade sort employs 
(t-1) merge, (t-2) merge and so on in the same pass for its run generation. 

Example 17.5 demonstrates cascade merge on 6 tapes for an initial generation of 55 runs of 
length 1. We make use of the run distribution notation introduced in Sec. 17.5. 





Example 17.5 There are 6 tapes (T,, T, Ts, Ty, Ts, Tẹ) using which 55 runs of length 1( 1°°) 
are to be cascade merge sorted, to generate the final run (55! ). 





The McGraw-Hill Companies 


448 Data Structures and Algorithms 
Table 17.3 illustrates the run distribution of cascade merge. 


Table 17.3 Run distribution on 6 tapes by cascade merge 





As before, let us assume that the initial distribution of (1° 114 112 1? 1°) runs on the tapes (T,, 
T» Ta} Ty Ts), was devised through some ‘intuitive’ means. 

In pass 1, we undertake a series of merges. A 5-way merge on (T4, T>, Ts, Tẹ Ts) yields the run 
5° that is put onto tape T,. A 4-way merge on (Tj, T» Ts T,) yields 4* which is put on to tape 
T;. A 3-way merge on (T;, T» T;) yields 3° which is distributed onto tape T}. A 2-way merge on 
(T,, T») yields 2? which is put onto tape T}. Lastly, a 1-way merge (which is mere copying of the 
balance run) on T} yields 1! which is copied on to tape T,. Of course, one could do away with 
the 1-way merge which is a mere copying of the run and retain the run in the tape itself. In pass 
1, Tape T} falls empty. 

In pass 2, we repeat the cascading merge wherein the 5-way merge on (T>, Ta, Ty Ts, Tẹ) yields 
the run 15!, a 4- way merge on (T, T, Ts, Tẹ yields 14! and so on until at the end of pass 2, the 
distribution of runs on the tapes is as shown in the table. This is the penultimate pass and observe 
how the distribution records one run each on the tapes. In the final pass, as it always is, the 5- 
way merge releases a single run of size 55 which is the final sorted file. 

Now how does one arrive at the perfect initial distribution? As was done for polyphase merge, 
this could be arrived at by working backwards from the goal state of (1, 0, 0, 0, 0) obtained during 
the nt! pass. Table 17.4 illustrates the run distribution by cascade merge on 5 tapes. 


Table 17.4 Run distribution on 5 tapes by cascade merge 





For an in depth analysis of merge patterns and other external sorting schemes, a motivated 
reader is referred to Donald Knuth, Art of Computer Programming, Vol. III, Second edition, 2002. 





The McGraw-Hill Companies 


External Sorting 449 


O Summary 





> External sorting deals with sorting of files or lists that are too huge to be accommodated 
in the internal memory of the computer and hence need to be stored in external storage 
devices such as disks or drums. 


> The principle behind external sorting is to first make use of any efficient internal sorting 
technique to generate runs. These runs are then merged in passes to obtain a single run at 
which stage the file is deemed sorted. The merge patterns called for by the strategies, are 
influenced by external storage medium on which the runs reside, viz., disks or tapes. 


> Magnetic tapes are sequential devices built on the principle of audio tape devices. Data is 
stored in blocks occurring sequentially. Magnetic disks are random access storage devices. 
Data stored in a disk is addressed by its cylinder, track and sector numbers. 


> Balanced merge sort is a technique that can be adopted on files residing on both disks and 
tapes. In its general form, a k-way merging could be undertaken during the runs. For the 
efficient management of merging runs, buffer handling and selection tree mechanisms are 
employed. 

> Balanced k-way merge sort on tapes calls for the use of 2k tapes for an efficient 
management of runs. Polyphase merge sort is a clever strategy that makes use of only (k+1) 
tapes to perform the k-way merge. The distribution of runs on the tapes follows a Fibonacci 
number sequence. 


> Cascade merge sort is yet another smart strategy which unlike polyphase merge sort does 
not employ a uniform merge pattern. Each pass makes use of a ‘cascading’ sequence of 
merge patterns. 


© Illustrative Problems 


Problem 17.1 The specification for a typical disk storage system is shown in Table I 17.1. 
An employee file consisting of 100,000 records is stored on the disk. The employee record structure 
and the size of the fields in bytes (shown in brackets) are given below: 


Employee Employee Designation) Address} Basic pay} Allowances} Deductions | Total salary 
number | name 





(6) (20) (10) (30) (6) (20) (20) (6) 
(a) What is the storage space (in terms of bytes) needed to store the employee file in the disk? 
(b) What is the storage space (in term of cylinders) needed to store the employee file in the 
disk? 


Solution: 
(a) The size of the employee record = 118 bytes 
Number of employee records that can be held in a sector = 512/ 118 = 4 records 
Number of sectors needed to hold the whole employee file = 100000/4 = 25,000 sectors 


450 


(b) 





The McGraw-Hill Companies 


Data Structures and Algorithms 


Table | 17.1 Specification for a typical disk storage system 


Number of plates J e 
ow 





0 
12 
50 milliseconds 
Time to read/write a sector 


The total number of bytes needed to store the file in the disk = 25, 000 x 512 
= 12800000 bytes 
= 12.2 megabytes 
Number of tracks needed to hold the whole employee file given that there are 50 sectors/ 
track = 25000 / 50 = 500 tracks 
.. Number of cylinders needed to store the whole file given that there are 10 tracks/ 
cylinder = 500/10 = 50 cylinders 





Problem 17.2 For the employee file discussed in Illustrative Problem 17.1, making use of 
Table 17.1, answer the questions given below: 
Records from the employee file are to be read and making use of the basic pay, allowances 


and 


deductions, the total salary is to be computed for each employee. Assume that it takes 


200 microseconds of CPU time to perform the computation for a single record. The updated 
records are to be written onto the disk. 


(a) 
(b) 


(c) 
(d) 


What is the time taken to process a sector of records? 

Having processed a sector of records, what is the time taken to process all records in the 
very next sector? 

What is the time taken to process the records, in all sectors of a track, assuming that the 
sectors are continuously read? 

What is the time taken to process all records in a cylinder? 


Solution: 


(a) 


The time taken to process a sector full of records = 

(1) Time taken to access the cylinder + (2) Time taken to access the sector + (3) time 
taken to read the records + (4) time taken to compute the net salary for the records + (5) time 
taken to access the sector to write back the records + (6) time taken to write the updated 
records onto the sector. 

For (1) and (2) since the question pertains to an arbitrary sector, we choose to use the 
average seek time of 25 milliseconds and the average latency time of 8.33 milliseconds, 
respectively. For (3) and (6) the time taken is 0.33 milliseconds each. For (4) it is 
0.8 milliseconds (200 microseconds x 4 records). 

The computation of (5) which is in fact the time taken for the sector to appear under the 
read/write head to perform the write operation, is a trifle involved. It is computed as, (the 





The McGraw-Hill Companies 


External Sorting 451 


maximum latency time (time taken for the track to make a full revolution) — time taken to 
read the sector — time taken to process the records by the CPU ). 
This is given by (16.66 — 0.33 — 0.8) = 15.53 milliseconds. 
the time taken to process all records in a sector = 
25 + 8.33 + 0.33 + 0.8 +15.53 + 0.33 = 50.32 milliseconds 

(b) While the time taken to process records in the first sector (Question 17.2(a)) includes the 
time taken to access the cylinder and the sector, to process the very next sector, there is no 
need to include the cylinder and sector access time since the reading is continuously done. 

the time taken to process the records in the very next sector = 
(3) time taken to read the records + (4) time taken to compute the net salary for 
the records + (5) time taken to access the sector to write back the records + (6) 
time taken to write the updated records onto the sector. 
= 0.33 + 0.8 + 15.53 + 0.33 = 16.99 milliseconds. 
(c) The time taken to process all records on a track = 
(7) time taken to process records in the first sector of the track + 
(8) time taken to process records in the next sector of the track x 49 sectors 
Here (7) and (8) have been obtained in Questions 17.2(a) and (b) respectively and therefore 
the result is given as, 
50.32 + 16.99 x 49= 882.83 milliseconds. 

(d) The time taken to continuously process all records in a cylinder, calls for processing all 
records track after track. Once the records in the first occurring track have been processed, 
the rest of the tracks in the cylinder are instantaneously accessed. 

the time taken to process all records in a cylinder = 
(9) time taken to process all records in the first track + (10) time taken to process 
all records in the next track of the cylinder x 9 tracks 
While (9) is found in Question 17.2(c), to compute (10) we simply need to use the time 
computed in (8) for all the 50 sectors in the next track. 
Therefore the result is given as 882.83 + 16.99 x 50 x 9 = 8.528 seconds. 


Problem 17.3 Illustrative Problem 17.2(d) computed the time taken to process all records of 
the employee file residing in a cylinder. Assume that the time taken for the read/write head to 
move from one cylinder to another is 10 milliseconds. 

(a) What is the time taken to process all records in the next cylinder? 

(b) What is the time taken to process the entire employee file of records in the disk? 


Solution: 

(a) Having processed a cylinder of records, the time taken to move to the next cylinder is 
10 milliseconds. The time taken to process all records in the next cylinder is a straightforward 
computation given by, 

(8) Time taken to process all records on the next sector x 50 sectors x 10 tracks 
Here (8) is obtained in Question 17.2(b). 
the total time taken to process all records in the next cylinder, moving from the current 
cylinder = 10 + (16.99 x 50 x 10) = 8.505 seconds 
(b) The entire employee file resides on 50 cylinders (Illustrative Problem 17.1(b)). 





The McGraw-Hill Companies 


452 Data Structures and Algorithms 


Therefore the time taken to process the entire file = 
(11) Time taken to process records in the first cylinder + (12) time taken to process 
records in the next cylinder x 49 
(11) is obtained in Illustrative Problem 17.2(d) and (12) is obtained in Illustrative 
Problem 17.3(a). 
the time taken to process the entire employee file = 8.528 + 8.505 x 49 = 7.088 minutes 


Problem 17.4 Given a file of 50, 000 records with an internal memory capacity of 10, 000 
records, trace the steps of a Balanced P-way merge sort for T = 6 tapes (T,, T> Tz, Ty Ts, Tg) and 
P =3. 


Solution: An internal sort of the file yields 5 runs of 10, 000 records each. Since P = 3, we need 
to undertake a 3-way merge. We therefore divide the 6 tapes into two groups of 3 tapes each. The 
two groups alternate as the input and output tapes during the merge passes. 

The initial distribution of runs on the tapes T}, T, and T} after internal sorting, are as follows: 


Tape 14: Ry ----R10000 R001 ...... Raoooo 


Tape 13: Ryooo1-+-R2o000  R40001--+-R50000 

Tape T3: Roogo1----Kga000 
Rewind the tapes T}, T, and T}. In the next pass, the 3-way merge of runs in tapes T}, T, and T, 
yield output runs on Ty, Ts, Tę as follows: 


Tape Ty: Ry, ----R30000 


Tape Ts: Rsqqo1-+--Kso000 

Tape Tẹ Empty 
Rewind tapes T, and T.. In the last pass a 3-way merge of runs in tapes T, and T, yield the final 
run on tape T} as follows: 


ee eccccce 


Problem 17.5 For a file comprising 50, 000 records with an internal memory capacity of 
10,000 records, the initial distribution of runs on two tapes T} and T, are as shown below: 


Tape Tj: Ry ...-Rygggo R0001- - --R30000 R340001- - -R 50000 


Tape T3: Rigoo1-+--20000  Rs30001 .....R40000 
Two standby tapes viz., Tą and T, are available. The following two merge patterns were undertaken. 
Which of these is efficient and why? 


Merge Pattern A Merge pattern B 


Pass 1 (2-way merge): Pass 1 (2-way merge): 


Tape Ta: Roes R0000 R 40001: - --R50000 Tape T R4 -R0000 


Tape T4: R209001- --R40000 Tape T4: Roqo01----Raoooo 
Rewind tapes T}, T,, Ts, T4 Tape Ty: Ry ----R10000 R20001-- --R30000 t R40001- - -R 50000 





(Contd.) 





The McGraw-Hill Companies 


External Sorting 453 


(Contd.) 


Rewind tapes T, T}, T, only. Tape T} retains the 
TUN R4o001:---R50000: Lhe + indicates the position of 
the read / write head from which point onwards T, 
would be read for the next pass. 


Pass 2 (2-way merge): Pass 2 (3-way merge of tapes T}, T} and T,) 


Tape Ty: Ry... Ragggg Tape T,: Ry ..--R50000 


Tape T3: Ragoo1----Kso000 
Rewind tapes T}, T, and T}. 


Pass 3 (2-way merge): 
Tape Ta: R,...-Reggqq 





Solution: Merge pattern B is efficient since the total number of records that were read to obtain 
the final run on tape T, was 40,000 + 50,000 = 90,000 records. This took place in 2 passes. 

On the other hand, Merge pattern A read 50,000 + 50,000 + 50,000 = 150, 000 records in three 
passes over the data, to obtain the final run on tape T}. 


Problem 17.6 There are 5 runs distributed on three tapes (T,, T», T3) as shown below. A 
standby tape (T,) is available. The internal memory capacity is 10,000 records. Undertake a balanced 
P-way merge devising a smart merge pattern for some P. 


Tape T): Ry -.--R10000 Rs0001 ...... Raoooo 
Tape T: Ry0001----20000 Raooo1----®s50000 
Tape T3: Ry 991-3000 


Solution: We first undertake a 3-way merge on the tapes T4, T, and T for the first three runs 
on the tapes. T, is used as the output tape. The configuration at the end of pass 1 are as shown 
below: 

Tape Ty: — R4-.--R30000 


Tape Ty: Ry ..--Ragggg T R3001 ...... R 40000 


Tape Ty: Ri0001----R20000 T Raooo1-++-®s0000 
Tapes T, and T, are alone rewound. 
In the final pass, a 3-way merge is undertaken on tapes T, , T}, T». The output is delivered on 
tape T} as shown below: 


Tape T3: Ry... --R50000 


The merge pattern for the specific case is efficient since only 30000 + 50000 = 80000 records were 
read in the two passes put together for the final sort of the file. 


Problem 17.7 Let us suppose a source file was initially sorted to generate 55 runs of size 
1 (1 °°). Trace polyphase merge on 3 tapes (T, T>, T3) undertaking a 2-way merge during each 
phase. 


Solution: Table I 17.7 shows the redistribution of runs on the tapes in each phase. Observe how 
the initial distribution of runs is taken after the Fibonacci number sequence. The polyphase 
merged file is available on tape T} in the final phase. 





The McGraw-Hill Companies 


454 Data Structures and Algorithms 


Table | 17.7 Polyphase merge on 3 tapes: redistribution of runs 


Problem 17.8 There are 6 tapes (T,, T, T3, Ty Ts, Tg) using which 190 runs of length 1 (110) 
are to be cascade merge sorted, to generate the final run (190! ). Trace the steps of the sorting 
process. 





Solution: ‘Table I 17.8 illustrates the run distribution of cascade merge. 
Table 17.8 Run distribution on 6 tapes by cascade merge 


Pass 


Initial distribution 


2 





Problem 17.9 Demonstrate balanced 3-way merge on the following “sample” list of keys 
available on a disk, with the internal memory capable of holding 6 keys: 
12 1 65 7 34 15 90 22 63 56 18 3 9 22 12 88 41 


Solution: ‘The internal sort of the list yields three runs as follows: 

Ri: 1 7 12 15 34 65 

R2: 3 18 22 56 63 90 

R3 9 12 22 41 88 
We divide the internal memory into three input buffers and an output buffer to undertake the 
3-way merge. Thus while the input buffers can hold one key each, we shall allow the output 
buffer to hold a maximum of 3 keys. Thus the input data will be read in blocks of one key each. 
During the merge, the output buffer releases blocks containing 3 keys each which are written onto 
the run. The merging passes are shown below: 





The McGraw-Hill Companies 


External Sorting 455 


Pass 1 Run R, Run R, Run R, 


17 2 15 34 65 3 18 22 56 63 90 9 12 22 41 88 
1379 12 12| 15 18 22|22 34 41 | 56 63 65| 88 90 


At the end of pass 1 the entire list is sorted. 


(@®) Review Questions 


1. (i) Cascade merge sort adopts uniform merge patterns in its passes 
(ii) The distribution of runs in the last pass of cascade merge sort is given by a pattern 
such as (1, 0, 0, ...0) 
(a) (i) true (ii) true (b) (i) true (ii) false (c) (i) false (ii) false (d) (i) false (ii) true 


2. Polyphase merge sort for a k-way merge on tapes requires ____ tapes 
(a) 2.k (b) (k-2) (c) (k+1) (d) k 
3. The time taken for the right sector to appear under the read / write head is known as 
(a) seek time (b) latency time (c) transmission time (d) data read time 
4. In the case of a balanced 2-way merge, if M runs were produced in the internal sorting phase 
and if 2%⁄1< M <2% then the sort procedure makes ———— — — merging passes over the 
data records. 
(a) M (b) | log, M] (c) [logy 2] (d) M2 
5. Match the following: 
W. Magnetic tape A. tree of winners 


X. Magnetic disks B. Fibonacci merge 
Y. Polyphase merge C. Inter Block Gap 
Z. k-way merge D. platters 
(a) (W A) (X B) (Y D) (Z ©) (b) (WC) (X D) (Y B) (Z A) 
(c) (WC) (X D) (Y A) (Z B) (d) (W A) (X B) (Y ©) (Z D) 

What is the general principle behind external sorting? 

How is a selection tree useful in a k-way merge? 

What are the advantages of Polyphase merge sort over balanced k-way merge sort? 

What is the principle behind the distribution of runs in a cascade merge sort? 

How is data organized in a magnetic disk? 

11. An inventory record contains the following fields: ITEM NUMBER (8 bytes), NAME 
(20 bytes), DESCRIPTION (20 bytes), TOTAL STOCK(10 bytes), PRICE(10 bytes) TOTAL 
PRICE (14 bytes). 

A record comprising the data on Item number, name, description and total stock is to be 
read and based on the current price which is input, the total price is to be computed and 
updated in the fields. There are 25, 000 records to be processed. Assuming the disk 
characteristics given in Table I 17.1, 
(i) How much storage space is required to store the entire file in the disk (in terms of 
bytes/KB/MB)? 


So MND 





The McGraw-Hill Companies 


456 Data Structures and Algorithms 


(ii) How much storage space is required to store the entire file in the disk in terms of 
cylinders? 
(iii) What is the time required to read, process and write back a given sector of records into 
the disk, assuming that it takes 100 microseconds to process a record? 
(iv) What is the time required to read, process and write back an entire track of records if 
they were read sequentially sector after sector? 
(v) What is the time required to read, process and write back an entire cylinder of records? 
(vi) What is the time required to read, process and write back the records in the next 
(immediate) cylinder? 
(vii) What is the time required to read, process and write back the entire file onto the disk? 
12. A file comprising 500, 000 records is to be sorted. The internal memory has a capacity to 
hold only 50, 000 records. Trace the steps of a Balanced k-way merge for (i) k = 2 and 
(ii) k= 4, when (a) the file is available on a tape and (ii) the file is available on a disk. Assume 
the availability of any number of tapes and a scratch disk for undertaking the appropriate 
sorting process. 


(=) Programming Assignments 


1. Implement a function to construct a tree of winners to obtain the smallest key from a list 
of keys representing its external nodes. 

2. Implement a function to construct a tree of losers to obtain the smallest key from a list of 
keys representing its external nodes. 

3. Making use of the function(s) developed in Programming Assignments 1 and 2 (Chapter 17), 
implement k-way merge algorithms for any given value of k. 

4. Implement Balanced k-way merge sort for disk based files. Simulate the program for various 
sizes of files, internal memory capacity and choice of k. Graphically display the distribution 
of runs. 


The McGraw-Hill Companies 





2-3 trees 270 
2-3-4 trees 294 
2-4 trees 270, 294 


Abstract Data Type 5 
Addition of polynomials 106 
Adjacency list 198 
matrix 195 
matrix representation 195 
ADT 
arrays 34 
binary trees 174 
graphs 208 
links 110 
queues 75 
singly linked lists 111 
stacks 48 
Algorithm 2 
definition 3 
development 4 
properties 3 
structure 3 
Alternate keys 355 
Amortized analysis of splay trees 317 
Apriori analysis 9 
recursive functions 17 
analysis 17 
approach 9 
Array 27 
ADT 34 
multi-dimensional 28 
number of elements 27 
one-dimensional 27 
operations 27 
representation 28 
two-dimensional 27 
Asymptotic notations 11 


Available space 130 
Average case complexity 14 
AVL search tree 229 

deletion 236 

insertion 230 

retrieval 230 

tree 229 


B tree of order m 293 
definition 269 
deletion 273 
height 277 
inserting 270 
searching 270 
trees 269 
trees of order 4 293 
B+ trees 283 
Balance factor 229 
Balanced k-way merging on disks 442 
merge sort 438, 441 
P-way merging on tapes 441 
trees 228 
Balancing symbols 133 
Base address 29 
Best case time complexity 14 
Bin sort 422 
Binary search 378 
ADT 174 
basic terminologies 155 
definition 218 
deletion 222 
drawbacks 227 
growth 168 
insertion 222 
representation 156, 219 
retrieval 220 
representation 156 





The McGraw-Hill Companies 


458 


search tree 218 

tree traversals 158 

traversals 158, 172 

trees 155 

types 155 
Bisection 378 
Black condition 295 
Block anchor 358 
Branch node 277 
Breadth first traversal 199 
Bubble sort 395 
Bucket sort 422 
Buffer handling 440 


Candidate keys 355 
Cascade merge sort 447 
Chained hash tables 340 
Chaining 339 
Circuit matrix 195 
matrix representation 197 
Circular queues 59, 62 
operations 62 
Circularly linked list 87, 93 
primitive operations 95 
representation 93 
Classification 6 
Cluster indexing 360 
Collating 401 
Collision 333 
resolution 338 
Complexity 8 
Construction of heap 415 


Conversion of infix expression to postfix 


expression 172 
Cut set matrix 195 

matrix representation 197 
Cycle 191 


Data abstraction 6 
classification 5 
definition 5 
structure 2,5 
structures and algorithms 4 
algorithms 4 
type 5 
Decision tree 
binary search 379 
Fibonacci search 381 
Deletion from a binary search tree 222 
from an AVL search tree 236 
Dense index 358 


Depth first traversal 201 
Deque 70 
Dequeuing a queue 56 
Development of an algorithm 4 
Dictionary 331 
Digital sort 422 
Dijkstra’s algorithm 203 
Diminishing increment sort 405 
Direct file organization 346, 363 
Doubly linked lists 87, 98 
advantages and disadvantages 99 
operations 100 
representation 98 
Drawbacks of a binary search tree 227 
of sequential data structures 84 
Dynamic memory management 130 


Enqueuing a queue 56 
Evaluation of expressions 43 
Exponential time complexities 12 
Expression trees 169 
External hashing 363 

memory 353 

sorting 394, 435 

storage devices 353, 436 


Fibonacci merge 447 
search 381 
File indexing 282 
operations 356 
organization 346 
Files 353, 354 
First Come First Served (FCFS) 56 
In First Out (FIFO) 56 


FLIFLO (First in Last In or First out Last Out) 70 


Folding 334 
Free storage pool 130 


Garbage collection 130 
Graph 187 
complete graphs 189 
connected graphs 191 
cut set 193 
degree 193 
directed 188 
empty graph 188 
Eulerian graph 194 
Hamiltonian circuit 194 
isomorphic graphs 193 
labeled graphs 194 





The McGraw-Hill Companies 


Index 


multigraph 188 
path 190 
subgraph 190 
trees 192 
undirected 188 
Graph 188 
search 384 
Growth of threaded binary trees 168 


Hard disks 436 
Hash function H 332 
functions 333 
table 332 
Hashing 332 
Head node 95 
Heap 356, 415 
sort 414 
Height balanced trees 228 
Home bucket 335 
Huffman coding 260 


Incidence matrix 195 
matrix representation 196 
Index 282 
Indexed sequential file organization 358 
sequential search 385 
Infix, prefix and postfix expressions 45 
Information node 277 
Inorder traversal 158 
Input buffers 440 
restricted deque 70 
Insertion and deletion in a singly linked list 88 
into a binary search tree 222 
into an AVL search tree 230 
sort 396 
Internal memory 353 
sorting 394, 435 
Interpolation search 376 
ISAM files 358 


Join operation 344 


k-way merging 403 

Keys 355 

Keyword table 342 

Koenigsberg bridge problem 186 


L category rotations 243 

Last In First Out 39 

LbO, Lb1 and Lb2 rotations 322 
Lexicographic search trees 277 


459 


Limitations of linear queues 61 

Linear data structures 6 
open addressed hash tables 336 
open addressing 334 
queues 59 
search 373 

Linked list 86 

Linked queues 124 
operations 124, 125 
representation 6, 168 
representation of graphs 198 
stack 124 
stack operations 125 

LL rotation 230 

LLb; LRO; RRb 297 

LLr, LRr, RRr 297 

Loading factor 338 

Logarithmic search 378 

LR rotation 232 

LrO, Lr1 and Lr2 rotations 323 


m-way search trees 262 
definition 263 
deleting 265 
drawbacks 268 
inserting 265 
node structure 263 
representation 263 
searching 264 
Magnetic disks 436, 437 
tapes 436 
Master file 357 
Merge sort 401, 435 
Merging 401 
Merits of linked data structures 85 
Minimum cost spanning trees 206 
Modular arithmetic 334 
MSD first sort 425 
Multi-dimensional array 28 
-way trees 262 
Multilevel indexing 360 
Multiply linked list 87, 103 


N-dimensional array 32 

Natural join 344 

Non-linear data structures 6 
Number of elements in an array 27 


One-dimensional array 27, 29 





The McGraw-Hill Companies 


460 


Operations 
circular queue 62 
doubly linked lists 100 
linked stacks and linked queues 124 
queues 57 
Optimal binary search tree 246 
Ordered linear search 373, 374 
lists 33 
Output buffer 440 
restricted deque 70 
Overflow 335 


Partitioning 410 
Path matrix 195 
matrix representation 197 
Pile organization 356 
Pivot element 410 
Polynomial representation 133 
time complexities 12 
Polyphase merge sort 445 
Posteriori testing 8 
Postorder traversal 158, 162 
Preorder traversal 158, 162 
Primary indexing 360 
keys 355 
Primitive operations on circularly linked lists 95 
Prims algorithm 206 
Priority queues 66 


Quadratic probing 339 
Queue 56 
dequeuing 56 
enqueuing 56 
implementation 57 
list 147 
operations 57 
Quick sort 410 


R-1 rotation 242 

RO rotation 240 

R1 rotation 241 

Radix sort 422 

Random access storage devices 353 
probing 339 

RbO, Rb1 304 

Rb2 imbalances 304 

Records 354 

Recurrence relations 15 

Recursion 15 

Recursive merge sort 404 
procedures 15 
programming 43 


Index 


Red condition 295 
Red-Black trees 293, 297, 303, 310 
definition 295 
deleting 303 
inserting 297 
introduction 293 
representation 296 
searching 296 
time complexity 310 
Rehashing 338 
Representation of a binary search tree 219 
of a red-black tree 296 
of a singly linked list 87 
of arrays in memory 28 
N-dimensional array 32 
one-dimensional array 29 
three-dimensional array 31 
two-dimensional array 29 
Reserved pool 132 
Retrieval from an AVL search tree 230 
RL rotation 233 
RLb imbalances 297 
RLr imbalances 297 
RR rotation 233 
Rr0, Rr1 304 
Rr2 imbalances 304 
Runs 435 


Searching a red-black tree 296 
Secondary memory 353 
indexing 361 
keys 355 
storage devices 353 
Selection sort 399 
tree 443 
Self organizing sequential search 375 
Sequential 6 
file organisation 357 
search 373 
storage devices 353 
Shell sort 405 
Sifting 397 
Single-source, shortest-path problem 203 
Singly linked list 87 
ADT 111 
insertion and deletion 88 
representation 87 
Sinking 397 
Skewed binary tree 156 
Sorting by distribution 394 
by exchange 394 





The McGraw-Hill Companies 


Index 


by insertion 394 
by merge 401 
by selection 394 
with disks 441 
with tapes 438 
Space complexity 8 
Sparse index 358 
matrix 32, 106 
matrix representation 109 
Spell checker 284 
Splay rotations 311 
trees 311 
amortized analysis 317 
Stable 394 
Stack 39, 40 
ADT 48 
implementation 40 
operations 40 
Super key 355 
Symbol tables 243 
Synonyms 333 


Tail recursion 45 
Tertiary storage devices 353 
Threaded binary trees 167 
Three-dimensional array 31 
Time complexity 8 

sharing system 71 
Topological sorting 121 


Tower of Hanoi 15 

Transaction file 357 

Transpose sequential search 375 
Traversable queue 137 


Traversals of an expression tree 172 


Tree of losers 443 
of winners 443 
search 384 
Trees 151 
basic terminologies 152 
definition 151 
representation 153 
Tries 277 
definition 277 
deletion 279 
insertion 279 
representation 277 
searching 279 
Truncation 334 
Two-dimensional array 27, 29 


Uniform binary search 379 
Unordered linear search 373, 374 


Worst case time complexity 14 


Zag 311 
Zig 311 


